基础知识

通用寄存器：r0-r31， 32位寄存器的名称是w0-w31，64位寄存器的名称是x0-x31。其中

r31:SP|WSP
r30:LR
r29:FP
r19~28 callee preserved[all 64bits need preserved even using ILP32 modle!] # 使用前需保存值，使用完后还原值
r18:platform related(inter-procedural state or PIC)
r16=IP0 r17=IP1(intra-procedure-call)
r9~15 temporal # 临时寄存器
r8:indirect result location # 返回值的地址指针
r0~r7 parameter/result # 8个参数寄存器，把前8个参数放都寄存器中，如果超过8个，会压入栈中，压栈的顺序为从右向左。

SIMD寄存器：v0-v31。其中

v8-v15使用前需保存信息，使用完后还原值。
其余寄存器随便使用。

关于指令前缀或者后缀

S 表示Signed
U 表示Unsigned
F 表示Float
P 表示Polynomial 或者寄存器内部组对操作pairwise
V 表示Across，即对整个寄存器的操作
2 一般表示高64位（64-128）的操作
H 表示半操作，即截取高位
N 表示窄化Narrow
L 表示宽化Long

A64 – Base Instructions 通用寄存器

A64 – SIMD and Floating-point Instructions SIMD寄存器指令

逻辑运算和比较运算

VAND、VBIC、VEOR、VORN 和 VORR （寄存器）（按位与、位清除、异或、或非以及或（寄存器））

VAND （按位与）、VBIC （位清除）、VEOR （按位异或）、VORN （按位或非）和VORR （按位或）指令在两个寄存器之间执行按位逻辑运算，并将结果存放到目标寄存器中。

AND (vector): Bitwise AND (vector). 按位与。

BIC (vector, register): Bitwise bit Clear (vector, register). 位清除

EOR (vector): Bitwise Exclusive OR (vector). 异或

EOR3: Three-way Exclusive OR.

ORN (vector): Bitwise inclusive OR NOT (vector). 或非

ORR (vector, register): Bitwise inclusive OR (vector, register). 或（寄存器）

VBIC 和 VORR （立即数）（按位位清除和或（立即数））

VBIC （位清除（立即数））获取目标向量的每个元素，对其与一个立即数执行按位与求补运算，并将结果返回到目标向量。

VORR （按位或（立即数））获取目标向量的每个元素，对其与一个立即数执行按位或运算，并将结果返回到目标向量。

BIC (vector, immediate): Bitwise bit Clear (vector, immediate). 按位位清除

ORR (vector, immediate): Bitwise inclusive OR (vector, immediate). 或（立即数）

VBIF、VBIT 和 VBSL （为 False 时按位插入，为 True 时按位插入以及按位选择）

VBIT （为 True 时按位插入）：如果第二个操作数的对应位为 1，则该指令将第一个操作数中的每一位插入目标中；否则将目标位保持不变。

VBIF （为 False 时按位插入）：如果第二个操作数的对应位为 0，则该指令将第一个操作数中的每一位插入目标中；否则将目标位保持不变。

VBSL （按位选择）：如果目标的对应位为 1，则该指令从第一个操作数中选择目标的每一位；如果目标的对应位为 0，则从第二个操作数中选择目标的每一位。

BIF: Bitwise Insert if False. 为 False 时按位插入

BIT: Bitwise Insert if True. 为 True 时按位插入

BSL: Bitwise Select. 按位选择

VCEQ、VCGE、VCGT、VCLE 和 VCLT （比较）

向量比较获取向量中每个元素的值，并将其与另一个向量中相应元素的值或零进行比较。如果条件为 True，则将目标向量中的相应元素全部设置为 1。否则，全部设置为 0。

CMEQ (register): Compare bitwise Equal (vector).

CMEQ (zero): Compare bitwise Equal to zero (vector).

CMGE (register): Compare signed Greater than or Equal (vector).

CMGE (zero): Compare signed Greater than or Equal to zero (vector).

CMGT (register): Compare signed Greater than (vector).

CMGT (zero): Compare signed Greater than zero (vector).

CMHI (register): Compare unsigned Higher (vector).

CMHS (register): Compare unsigned Higher or Same (vector).

CMLE (zero): Compare signed Less than or Equal to zero (vector).

CMLT (zero): Compare signed Less than zero (vector).

VTST （测试位）

VTST （向量测试位）获取向量中的每个元素，并将其与另一个向量中的相应元素执行按位逻辑“与”运算。如果结果不为 0，则将目标向量中的相应元素全部设置为 1。否则，全部设置为 0。

CMTST: Compare bitwise Test bits nonzero (vector). 测试位

其他位操作

XAR: Exclusive OR and Rotate.

BCAX: Bit Clear and XOR.

RAX1: Rotate and Exclusive OR.

RBIT (vector): Reverse Bit order (vector).

通用数据处理指令

VCVT （在定点数或整数与浮点数之间）定点数或整数与浮点数之间的向量转换。

VCVT （向量转换）按下列方式之一转换一个向量中的每个元素，并将结果存放
到目标向量中：

浮点数到整数
整数到浮点数
浮点数到定点数
定点数到浮点数

舍入

整数或定点数到浮点数的转换使用向最接近的数舍入。
浮点数到整数或定点数的转换使用向零舍入。

SCVTF (scalar, fixed-point): Signed fixed-point Convert to Floating-point (scalar).

SCVTF (scalar, integer): Signed integer Convert to Floating-point (scalar).

SCVTF (vector, fixed-point): Signed fixed-point Convert to Floating-point (vector).

SCVTF (vector, integer): Signed integer Convert to Floating-point (vector).

UCVTF (scalar, fixed-point): Unsigned fixed-point Convert to Floating-point (scalar).

UCVTF (scalar, integer): Unsigned integer Convert to Floating-point (scalar).

UCVTF (vector, fixed-point): Unsigned fixed-point Convert to Floating-point (vector).

UCVTF (vector, integer): Unsigned integer Convert to Floating-point (vector).

VDUP 将标量复制到向量的所有向量线。

VDUP （向量复制）将标量复制到目标向量的每个元素。源可以是 NEON 标量或ARM 寄存器。

DUP (element): Duplicate vector element to vector or scalar.

DUP (general): Duplicate general-purpose register to vector.

VEXT 提取。

VEXT （向量提取）从第二个操作数向量的低位和第一个操作数的高位提取 8 位元素，将这些元素连接起来，并将结果存放到目标向量中。

EXT: Extract vector from pair of vectors.

VMOV、VMVN （立即数）移动和求反移动（立即数）。

VMOV （向量移动）和 VMVN （向量求反移动）（立即数）生成一个立即数，并将结果存放到目标寄存器。

向量移动（寄存器）将源寄存器中的值复制到目标寄存器中。

向量求反移动（寄存器）对源寄存器中每一位的值执行求反运算，并将结果存放到目标寄存器中。

MOV (element): Move vector element to another vector element: an alias of INS (element).

MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).

MOV (scalar): Move vector element to scalar: an alias of DUP (element).

MOV (to general): Move vector element to general-purpose register: an alias of UMOV.

MOV (vector): Move vector: an alias of ORR (vector, register).

MVN: Bitwise NOT (vector): an alias of NOT. 求反移动

NOT: Bitwise NOT (vector).

MVNI: Move inverted Immediate (vector).

VMOVL、V{Q}MOVN、VQMOVUN 移动（寄存器）。

VMOVL （向量长移）获取双字向量中的每个元素，用符号或零将其扩展到原长度的两倍，并将结果存放到四字向量中。

VMOVN （向量窄移）将四字向量中每个元素的最低有效半部复制到双字向量的相应元素中。

VQMOVN （向量饱和窄移）将操作数向量中的每个元素复制到目标向量的相应元素中。结果元素是操作数元素宽度的一半，并且会将值饱和到结果宽度。

VQMOVUN （向量饱和窄移，有符号操作数和无符号结果）将操作数向量的每个元素复制到目标向量的相应元素中。结果元素是操作数元素宽度的一半，并且会将值饱和到结果宽度。

SXTL, SXTL2: Signed extend Long: an alias of SSHLL, SSHLL2.

UXTL, UXTL2: Unsigned extend Long: an alias of USHLL, USHLL2.

XTN, XTN2: Extract Narrow.

SQXTN, SQXTN2: Signed saturating extract Narrow.

SQXTUN, SQXTUN2: Signed saturating extract Unsigned Narrow.

UQXTN, UQXTN2: Unsigned saturating extract Narrow.

通用寄存器和SIMD寄存器交互

INS (element): Insert vector element from another vector element.

INS (general): Insert vector element from general-purpose register.

SMOV: Signed Move vector element to general-purpose register.

UMOV: Unsigned Move vector element to general-purpose register.

VREV 反转向量中的元素。

VREV16 （向量在半字中反转）反转向量每个半字中的 8 位元素的顺序，并将结果存放到对应的目标向量中。

VREV32 （向量在字中反转）反转向量每个字中的 8 位或 16 位元素的顺序，并将结果存放到对应的目标向量中。

VREV64 （向量在双字中反转）反转向量每个双字中的 8 位、16 位或 32 位元素的顺序，并将结果存放到对应的目标向量中。

REV16 (vector): Reverse elements in 16-bit halfwords (vector).

REV32 (vector): Reverse elements in 32-bit words (vector).

REV64: Reverse elements in 64-bit doublewords (vector).

VTBL、VTBX 向量表查找。

VTBL （向量表查找）使用控制向量中的字节索引在表中查找字节值，并生成一个新的向量。如果索引超出范围，则返回 0。

VTBX （向量表扩展）的用法与上一指令相同，但索引超出范围时目标元素将保持不变。

TBL: Table vector Lookup.

TBX: Table vector lookup extension.

VTRN 向量转置。

VTRN （向量转置）将其操作数向量的元素视为 2 x 2 矩阵的元素，并对此类矩阵进行转置。

TRN1: Transpose vectors (primary).

TRN2: Transpose vectors (secondary).

VUZP、VZIP 向量交叉存取和反向交叉存取。

VZIP （向量压缩）交叉存取两个向量的元素。

VUZP （向量解压缩）反向交叉存取两个向量的元素。

UZP1: Unzip vectors (primary).

UZP2: Unzip vectors (secondary).

ZIP1: Zip vectors (primary).

ZIP2: Zip vectors (secondary).

移位指令

VSHL、VQSHL、VQSHLU 和 VSHLL （按立即数）按立即值左移。

VSHL、VQSHL、VQSHLU 和 VSHLL （按立即数）

向量左移（按立即数）指令获取整数向量中的每个元素，按立即值对其进行左移，并将结果存放到目标向量中。

对于 VSHL （向量左移），每个元素中从左侧移出的位将丢失。

对于 VQSHL （向量饱和左移）和 VQSHLU （向量无符号饱和左移），如果发生饱和，则设置粘性 QC 标记（FPSCR 位 [27]）。

对于 VSHLL （向量长型左移），将使用符号或零对值进行扩展。

SHL: Shift Left (immediate).

SQSHL (immediate): Signed saturating Shift Left (immediate).

UQSHL (immediate): Unsigned saturating Shift Left (immediate).

SQSHLU: Signed saturating Shift Left Unsigned (immediate).

SHLL, SHLL2: Shift Left Long (by element size).

SSHLL, SSHLL2: Signed Shift Left Long (immediate).

USHLL, USHLL2: Unsigned Shift Left Long (immediate).

V{Q}{R}SHL （按有符号变量）按有符号变量左移。

V{Q}{R}SHL （按有符号变量）

VSHL （向量按有符号变量左移）获取一个向量中的每个元素，按另一个向量的相应元素的最低有效字节中的值对其进行移位，并将结果存放到目标向量中。如果移位值为正数，则该运算为左移。否则为右移。

可以选择对结果执行饱和或舍入运算，或者同时执行这两种运算。如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

SSHL: Signed Shift Left (register).

USHL: Unsigned Shift Left (register).

SQSHL (register): Signed saturating Shift Left (register).

UQSHL (register): Unsigned saturating Shift Left (register).

SRSHL: Signed Rounding Shift Left (register).

URSHL: Unsigned Rounding Shift Left (register).

SQRSHL: Signed saturating Rounding Shift Left (register).

UQRSHL: Unsigned saturating Rounding Shift Left (register).

V{R}SHR{N}、V{R}SRA （按立即数）按立即值右移。

V{R}SHR{N}、V{R}SRA （按立即数）

V{R}SHR{N} （向量按立即值右移）获取向量中的每个元素，按立即值对其进行右移，并将结果存放到目标向量中。可以选择对结果执行舍入或窄型运算，或者同时执行这两种运算。

V{R}SRA （向量按立即值右移并累加）获取向量中的每个元素，按立即值对其进行右移，并将结果累加到目标向量中。可以选择对结果进行舍入。

SSHR: Signed Shift Right (immediate).

USHR: Unsigned Shift Right (immediate).

SHRN, SHRN2: Shift Right Narrow (immediate).

SRSHR: Signed Rounding Shift Right (immediate).

URSHR: Unsigned Rounding Shift Right (immediate).

RSHRN, RSHRN2: Rounding Shift Right Narrow (immediate).

SSRA: Signed Shift Right and Accumulate (immediate).

USRA: Unsigned Shift Right and Accumulate (immediate).

SRSRA: Signed Rounding Shift Right and Accumulate (immediate).

URSRA: Unsigned Rounding Shift Right and Accumulate (immediate).

VQ{R}SHR{U}N （按立即数）按立即值右移并进行饱和。

VQ{R}SHR{U}N （按立即数）

VQ{R}SHR{U}N （向量饱和右移、窄型、按立即值，可选舍入）获取整数四字向量中的每个元素，按立即值对其进行右移，并将结果存放到双字向量中。

如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

SQSHRN, SQSHRN2: Signed saturating Shift Right Narrow (immediate).

UQSHRN, UQSHRN2: Unsigned saturating Shift Right Narrow (immediate).

SQRSHRN, SQRSHRN2: Signed saturating Rounded Shift Right Narrow (immediate).

UQRSHRN, UQRSHRN2: Unsigned saturating Rounded Shift Right Narrow (immediate).

SQRSHRUN, SQRSHRUN2: Signed saturating Rounded Shift Right Unsigned Narrow (immediate).

SQSHRUN, SQSHRUN2: Signed saturating Shift Right Unsigned Narrow (immediate).

VSLI 和 VSRI 左移并插入，右移并插入。

VSLI （向量左移并插入）获取向量中的每个元素，按立即值对其进行左移，并将结果插入目标向量中。每个元素中从左侧移出的位将丢失。

VSRI （向量右移并插入）获取向量中的每个元素，按立即值对其进行右移，并将结果插入目标向量中。每个元素中从最右侧移出的位将丢失。

SLI: Shift Left and Insert (immediate).

SRI: Shift Right and Insert (immediate).

通用算术指令

VABA{L} 和 VABD{L} 向量差值绝对值累加和差值绝对值。

VABA （向量差值绝对值累加）用一个向量的元素减去另一个向量的相应元素，并将结果的绝对值累加到目标向量的元素中。

VABD （向量差值绝对值）用一个向量的元素减去另一个向量的相应元素，并将结果的绝对值存放到目标向量的元素中。

这两个指令的长型格式都可用。

SABA: Signed Absolute difference and Accumulate.

SABAL, SABAL2: Signed Absolute difference and Accumulate Long.

UABA: Unsigned Absolute difference and Accumulate.

UABAL, UABAL2: Unsigned Absolute difference and Accumulate Long.

SABD: Signed Absolute Difference.

SABDL, SABDL2: Signed Absolute Difference Long.

UABD: Unsigned Absolute Difference (vector).

UABDL, UABDL2: Unsigned Absolute Difference Long.

V{Q}ABS 和 V{Q}NEG 向量绝对值和求反。

VABS （向量绝对值）获取一个向量中每个元素的绝对值，并将结果存放到另一个向量中。（对于浮点格式，仅清除符号位。）

VNEG （向量求反）对一个向量中的每个元素执行求反运算，并将结果存放到另一个向量中。（对于浮点格式，仅反转符号位。）

这两个指令的饱和格式都可用。如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

ABS: Absolute value (vector).

SQABS: Signed saturating Absolute value.

NEG (vector): Negate (vector).

SQNEG: Signed saturating Negate.

V{Q}ADD、VADDL、VADDW、V{Q}SUB、VSUBL 和 VSUBW 向量加法和减法。

VADD （向量加法）将两个向量中的相应元素相加，并将结果存放到目标向量中。

VSUB （向量减法）用一个向量的元素减去另一个向量的相应元素，并将结果存放到目标向量中。

饱和、长型和宽型格式都可用。如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

ADD (vector): Add (vector).

SQADD: Signed saturating Add.

UQADD: Unsigned saturating Add.

SADDL, SADDL2: Signed Add Long (vector).

UADDL, UADDL2: Unsigned Add Long (vector).

SADDW, SADDW2: Signed Add Wide.

UADDW, UADDW2: Unsigned Add Wide.

SUB (vector): Subtract (vector).

SQSUB: Signed saturating Subtract.

UQSUB: Unsigned saturating Subtract.

SSUBL, SSUBL2: Signed Subtract Long.

USUBL, USUBL2: Unsigned Subtract Long.

SSUBW, SSUBW2: Signed Subtract Wide.

USUBW, USUBW2: Unsigned Subtract Wide.

V{R}ADDHN 和 V{R}SUBHN 选择高半部分的向量加法和选择高半部分的向量减法。

V{R}ADDHN 和 V{R}SUBHN

V{R}ADDH （向量窄型加法，选择高半部分）将两个向量中的相应元素相加，选择相加结果的最高有效半部，并将最终结果存放到目标向量中。可将结果舍入或截断。

V{R}SUBH （向量窄型减法，选择高半部分）用一个向量的元素减去另一个向量的相应元素，选择相减结果的最高有效半部，并将最终结果存放到目标向量中。可将结果舍入或截断。

ADDHN, ADDHN2: Add returning High Narrow.

RADDHN, RADDHN2: Rounding Add returning High Narrow.

SUBHN, SUBHN2: Subtract returning High Narrow.

RSUBHN, RSUBHN2: Rounding Subtract returning High Narrow.

V{R}HADD 和 VHSUB 向量半加和半减。

VHADD （向量半加）将两个向量中的相应元素相加，将每个结果右移一位，并将这些结果存放到目标向量中。可将结果舍入或截断。

VHSUB （向量半减）用一个向量的元素减去另一个向量的相应元素，将每个结果右移一位，并将这些结果存放到目标向量中。结果将总是被截断。

SHADD: Signed Halving Add.

UHADD: Unsigned Halving Add.

SRHADD: Signed Rounding Halving Add.

URHADD: Unsigned Rounding Halving Add.

SHSUB: Signed Halving Subtract.

UHSUB: Unsigned Halving Subtract.

VPADD{L}、VPADAL 向量按对加，向量按对加并累加。

VPADD （向量按对加）将两个向量的相邻元素对相加，并将结果存放到目标向量中。

VPADDL （向量长型按对加）将向量中相邻的元素对相加，用符号或零将结果扩展为原宽度的两倍，并将最终结果存放到目标向量中。

VPADAL （向量长型按对加累加）将向量中相邻的元素对相加，并将结果的绝对值累加到目标向量的元素中。

ADDP (scalar): Add Pair of elements (scalar).

ADDP (vector): Add Pairwise (vector).

SADDLP: Signed Add Long Pairwise.

UADDLP: Unsigned Add Long Pairwise.

SADALP: Signed Add and Accumulate Long Pairwise.

UADALP: Unsigned Add and Accumulate Long Pairwise.

无符号、有符号加

SUQADD: Signed saturating Accumulate of Unsigned value.

USQADD: Unsigned saturating Accumulate of Signed value.

VMAX、VMIN、VPMAX 和 VPMIN 向量最大值，向量最小值，向量按对最大值和向量按对最小值。

VMAX （向量最大值）对两个向量中的相应元素进行比较，并将每一对中的较大值复制到目标向量的相应元素中。

VMIN （向量最小值）对两个向量中的相应元素进行比较，并将每一对中的较小值复制到目标向量的相应元素中。

VPMAX （向量按对最大值）对两个向量中的相邻元素对进行比较，并将每一对中的较大值复制到目标向量的相应元素中。操作数和结果必须为双字向量。

VPMIN （向量按对最小值）对两个向量中的相邻元素对进行比较，并将每一对中的较小值复制到目标向量的相应元素中。操作数和结果必须为双字向量。

有关按对运算的图示，请参阅第5-63 页的图5-5。

浮点最大值和最小值：max(+0.0, –0.0) = +0.0，min(+0.0, –0.0) = –0.0
如果任意输入为非数字，则对应的结果元素为缺省非数字。

SMAX: Signed Maximum (vector).

UMAX: Unsigned Maximum (vector).

SMIN: Signed Minimum (vector).

UMIN: Unsigned Minimum (vector).

SMAXP: Signed Maximum Pairwise.

UMAXP: Unsigned Maximum Pairwise.

SMINP: Signed Minimum Pairwise.

UMINP: Unsigned Minimum Pairwise.

V操作

求得向量中的总和、最值

ADDV: Add across Vector.

SADDLV: Signed Add Long across Vector.

UADDLV: Unsigned sum Long across Vector.

SMAXV: Signed Maximum across Vector.

UMAXV: Unsigned Maximum across Vector.

SMINV: Signed Minimum across Vector.

UMINV: Unsigned Minimum across Vector.

VCLS、VCLZ 和 VCNT 向量前导符号位计数，前导零计数和设置位计数。

VCLS （向量前导符号位计数）计算一个向量的每个元素中最高位后面与最高位相同的连续位数目，并将结果存放到另一个向量中。

VCLZ （向量前导零计数）计算一个向量的每个元素中从最高位开始算起的连续零数目，并将结果存放到另一个向量中。

VCNT （向量设置位计数）计算一个向量的每个元素中值为 1 的位的数目，并将结果存放到另一个向量中。

CLS (vector): Count Leading Sign bits (vector).

CLZ (vector): Count Leading Zero bits (vector).

CNT: Population Count per byte.

VRECPE 和 VRSQRTE 向量近似倒数和近似平方根倒数。

VRECPE （向量近似倒数）求出一个向量中每个元素的近似倒数，并将结果存放到另一个向量中。

VRSQRTE （向量近似平方根倒数）求出一个向量中每个元素的近似平方根倒数，并将结果存放到另一个向量中。

URECPE: Unsigned Reciprocal Estimate.

URSQRTE: Unsigned Reciprocal Square Root Estimate.

乘法指令

VMUL{L}、VMLA{L} 和 VMLS{L} 向量乘法、向量乘加和向量乘减。

VMUL （向量乘法）将两个向量中的相应元素相乘，并将结果存放到目标向量中。

VMLA （向量乘加）将两个向量中的相应元素相乘，并将结果累加到目标向量的元素中。

VMLS （向量乘减）将两个向量中的相应元素相乘，从目标向量的相应元素中减去相乘的结果，并将最终结果放入目标向量中。

MUL (vector): Multiply (vector).

SMULL, SMULL2 (vector): Signed Multiply Long (vector).

UMULL, UMULL2 (vector): Unsigned Multiply long (vector).

MLA (vector): Multiply-Add to accumulator (vector).

SMLAL, SMLAL2 (vector): Signed Multiply-Add Long (vector).

UMLAL, UMLAL2 (vector): Unsigned Multiply-Add Long (vector).

MLS (vector): Multiply-Subtract from accumulator (vector).

SMLSL, SMLSL2 (vector): Signed Multiply-Subtract Long (vector).

UMLSL, UMLSL2 (vector): Unsigned Multiply-Subtract Long (vector).

VMUL{L}、VMLA{L} 和 VMLS{L} （按标量）向量乘法、向量乘加和向量乘减（按标量）。

VMUL （向量乘以标量）将向量中的每个元素乘以标量，并将结果放入目标向量中。

VMLA （向量乘加）将向量中的每个元素乘以标量，并将结果累加到目标向量的相应元素中。

VMLS （向量乘减）将向量中的每个元素乘以标量，然后从目标向量的相应元素中减去相乘的结果，并将最终结果放入目标向量中。

MUL (by element): Multiply (vector, by element).

SMULL, SMULL2 (by element): Signed Multiply Long (vector, by element).

UMULL, UMULL2 (by element): Unsigned Multiply Long (vector, by element).

MLA (by element): Multiply-Add to accumulator (vector, by element).

SMLAL, SMLAL2 (by element): Signed Multiply-Add Long (vector, by element).

UMLAL, UMLAL2 (by element): Unsigned Multiply-Add Long (vector, by element).

MLS (by element): Multiply-Subtract from accumulator (vector, by element).

SMLSL, SMLSL2 (by element): Signed Multiply-Subtract Long (vector, by element).

UMLSL, UMLSL2 (by element): Unsigned Multiply-Subtract Long (vector, by element).

VQDMULL、VQDMLAL 和 VQDMLSL （按向量或标量）向量饱和加倍乘法、向量乘加和向量乘减（按向量或标量）

向量饱和加倍乘法指令将其操作数相乘并将结果加倍。VQDMULL 将结果存放到目标寄存器中。VQDMLAL 将结果与目标寄存器中的值相加。VQDMLSL 用目标寄存器中的值减去结果。

如果任意结果溢出，则会对其进行饱和。如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).

SQDMULL, SQDMULL2 (vector): Signed saturating Doubling Multiply Long.

SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).

SQDMLAL, SQDMLAL2 (vector): Signed saturating Doubling Multiply-Add Long.

SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).

SQDMLSL, SQDMLSL2 (vector): Signed saturating Doubling Multiply-Subtract Long.

VQ{R}DMULH （按向量或标量）返回高半部分的向量饱和加倍乘法（按向量或标量）。

向量饱和加倍乘法指令将其操作数相乘并将结果加倍。此类指令仅返回结果的高半部分。

如果任意结果溢出，则会对其进行饱和。如果发生饱和，则会设置粘性 QC 标记（FPSCR 位 [27]）。

SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).

SQDMULH (vector): Signed saturating Doubling Multiply returning High half.

SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).

SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.

SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element).

SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

多项式乘法

PMUL: Polynomial Multiply.

PMULL, PMULL2: Polynomial Multiply Long.

点积Dot

SDOT (by element): Dot Product signed arithmetic (vector, by element).

SDOT (vector): Dot Product signed arithmetic (vector).

UDOT (by element): Dot Product unsigned arithmetic (vector, by element).

UDOT (vector): Dot Product unsigned arithmetic (vector).

加载/存储

VLDn 和 VSTn （单个 n 元素结构到一条向量线）此类指令几乎可用于所有数据访问。可加载标准向量 (n = 1)。

向量加载单个 n 元素结构到一条向量线。它将一个 n 元素结构从内存加载到一个或多个 NEON 寄存器。未加载的寄存器元素将保持不变。

向量存储单个 n 元素结构到一条向量线。它将一个 n 元素结构从一个或多个NEON 寄存器存储到内存中。

LD1 (single structure): Load one single-element structure to one lane of one register.

LD2 (single structure): Load single 2-element structure to one lane of two registers.

LD3 (single structure): Load single 3-element structure to one lane of three registers).

LD4 (single structure): Load single 4-element structure to one lane of four registers.

ST1 (single structure): Store a single-element structure from one lane of one register.

ST2 (single structure): Store single 2-element structure from one lane of two registers.

ST3 (single structure): Store single 3-element structure from one lane of three registers.

ST4 (single structure): Store single 4-element structure from one lane of four registers.

VLDn （单个 n 元素结构到所有向量线）

向量加载单个 n 元素结构到所有向量线。它将一个 n 元素结构的多个副本从内存加载到一个或多个 NEON 寄存器。

LD1R: Load one single-element structure and Replicate to all lanes (of one register).

LD2R: Load single 2-element structure and Replicate to all lanes of two registers.

LD3R: Load single 3-element structure and Replicate to all lanes of three registers.

LD4R: Load single 4-element structure and Replicate to all lanes of four registers.

VLDn 和 VSTn （多个 n 元素结构）

向量加载多个 n 元素结构。它使用反向交叉存取功能，将多个 n 元素结构从内存加载到一个或多个 NEON 寄存器中（除非 n == 1）。会加载每个寄存器的每个元素。

向量存储多个 n 元素结构。它使用交叉存取功能，将多个 n 元素结构从一个或多个 NEON 寄存器存储到内存中（除非 n == 1）。会存储每个寄存器的每个元素。

LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.

LD2 (multiple structures): Load multiple 2-element structures to two registers.

LD3 (multiple structures): Load multiple 3-element structures to three registers.

LD4 (multiple structures): Load multiple 4-element structures to four registers.

ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.

ST2 (multiple structures): Store multiple 2-element structures from two registers.

ST3 (multiple structures): Store multiple 3-element structures from three registers.

ST4 (multiple structures): Store multiple 4-element structures from four registers.

NEON 和 VFP 伪指令

VLDR 伪指令（NEON 和 VFP）

VLDR 伪指令将一个常数值加载到 64 位 NEON 向量的每个元素，或者加载到 VFP单精度或双精度寄存器。

如果某一指令（如 VMOV）可用于直接将常数生成到寄存器中，则汇编器将使用该指令。否则，汇编器生成一个包含常数的双字文字池条目，并使用 VLDR 指令加载该常数。

LDR (literal, SIMD&FP): Load SIMD&FP Register (PC-relative literal).

VLDR 和 VSTR （后增量和前增量）（NEON 和 VFP）

使用后增量和前增量加载或存储扩展寄存器的伪指令。

有关不使用后增量和前增量的 VLDR 和 VSTR 指令的信息，请参阅第5-23 页的 VLDR 和 VSTR。

后增量指令在传送后按偏移量的值递增寄存器中的基址。前增量指令按偏移量的值递减寄存器中的基址，然后使用寄存器中的新地址执行传送。这些伪指令汇编为 VLDM 或 VSTM 指令（请参阅第5-24 页的VLDM、VSTM、VPOP 和VPUSH）。

LDR (immediate, SIMD&FP): Load SIMD&FP Register (immediate offset).

LDR (register, SIMD&FP): Load SIMD&FP Register (register offset).

STR (immediate, SIMD&FP): Store SIMD&FP register (immediate offset).

STR (register, SIMD&FP): Store SIMD&FP register (register offset).

VMOV2 （仅限 NEON）

VMOV2 伪指令生成一个常数并将其存放到 NEON 向量的每个元素中，而不从文字池中加载值。它始终正好汇编为两个指令。

VMOV2 可生成任何 16 位常数，以及限定范围的 32 位和 64 位常数。

VMOV2 通常汇编为 VMOV 或 VMVN 指令，后跟 VBIC 或 VORR 指令。有关详细信息，请参阅第5-44 页的VMOV、VMVN（立即数）和第5-32 页的VBIC 和 VORR （立即数）。

MOVI: Move Immediate (vector).

浮点运算

FABD: Floating-point Absolute Difference (vector).

FABS (scalar): Floating-point Absolute value (scalar).

FABS (vector): Floating-point Absolute value (vector).

FACGE: Floating-point Absolute Compare Greater than or Equal (vector).

FACGT: Floating-point Absolute Compare Greater than (vector).

FADD (scalar): Floating-point Add (scalar).

FADD (vector): Floating-point Add (vector).

FADDP (scalar): Floating-point Add Pair of elements (scalar).

FADDP (vector): Floating-point Add Pairwise (vector).

FCADD: Floating-point Complex Add.

FCCMP: Floating-point Conditional quiet Compare (scalar).

FCCMPE: Floating-point Conditional signaling Compare (scalar).

FCMEQ (register): Floating-point Compare Equal (vector).

FCMEQ (zero): Floating-point Compare Equal to zero (vector).

FCMGE (register): Floating-point Compare Greater than or Equal (vector).

FCMGE (zero): Floating-point Compare Greater than or Equal to zero (vector).

FCMGT (register): Floating-point Compare Greater than (vector).

FCMGT (zero): Floating-point Compare Greater than zero (vector).

FCMLA: Floating-point Complex Multiply Accumulate.

FCMLA (by element): Floating-point Complex Multiply Accumulate (by element).

FCMLE (zero): Floating-point Compare Less than or Equal to zero (vector).

FCMLT (zero): Floating-point Compare Less than zero (vector).

FCMP: Floating-point quiet Compare (scalar).

FCMPE: Floating-point signaling Compare (scalar).

FCSEL: Floating-point Conditional Select (scalar).

FCVT: Floating-point Convert precision (scalar).

FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).

FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).

FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

FCVTL, FCVTL2: Floating-point Convert to higher precision Long (vector).

FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).

FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).

FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

FCVTN, FCVTN2: Floating-point Convert to lower precision Narrow (vector).

FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).

FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).

FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).

FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).

FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).

FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).

FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).

FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).

FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).

FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).

FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

FDIV (scalar): Floating-point Divide (scalar).

FDIV (vector): Floating-point Divide (vector).

FJCVTZS: Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.

FMADD: Floating-point fused Multiply-Add (scalar).

FMAX (scalar): Floating-point Maximum (scalar).

FMAX (vector): Floating-point Maximum (vector).

FMAXNM (scalar): Floating-point Maximum Number (scalar).

FMAXNM (vector): Floating-point Maximum Number (vector).

FMAXNMP (scalar): Floating-point Maximum Number of Pair of elements (scalar).

FMAXNMP (vector): Floating-point Maximum Number Pairwise (vector).

FMAXNMV: Floating-point Maximum Number across Vector.

FMAXP (scalar): Floating-point Maximum of Pair of elements (scalar).

FMAXP (vector): Floating-point Maximum Pairwise (vector).

FMAXV: Floating-point Maximum across Vector.

FMIN (scalar): Floating-point Minimum (scalar).

FMIN (vector): Floating-point minimum (vector).

FMINNM (scalar): Floating-point Minimum Number (scalar).

FMINNM (vector): Floating-point Minimum Number (vector).

FMINNMP (scalar): Floating-point Minimum Number of Pair of elements (scalar).

FMINNMP (vector): Floating-point Minimum Number Pairwise (vector).

FMINNMV: Floating-point Minimum Number across Vector.

FMINP (scalar): Floating-point Minimum of Pair of elements (scalar).

FMINP (vector): Floating-point Minimum Pairwise (vector).

FMINV: Floating-point Minimum across Vector.

FMLA (by element): Floating-point fused Multiply-Add to accumulator (by element).

FMLA (vector): Floating-point fused Multiply-Add to accumulator (vector).

FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).

FMLAL, FMLAL2 (vector): Floating-point fused Multiply-Add Long to accumulator (vector).

FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).

FMLS (vector): Floating-point fused Multiply-Subtract from accumulator (vector).

FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).

FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).

FMOV (general): Floating-point Move to or from general-purpose register without conversion.

FMOV (register): Floating-point Move register without conversion.

FMOV (scalar, immediate): Floating-point move immediate (scalar).

FMOV (vector, immediate): Floating-point move immediate (vector).

FMSUB: Floating-point Fused Multiply-Subtract (scalar).

FMUL (by element): Floating-point Multiply (by element).

FMUL (scalar): Floating-point Multiply (scalar).

FMUL (vector): Floating-point Multiply (vector).

FMULX: Floating-point Multiply extended.

FMULX (by element): Floating-point Multiply extended (by element).

FNEG (scalar): Floating-point Negate (scalar).

FNEG (vector): Floating-point Negate (vector).

FNMADD: Floating-point Negated fused Multiply-Add (scalar).

FNMSUB: Floating-point Negated fused Multiply-Subtract (scalar).

FNMUL (scalar): Floating-point Multiply-Negate (scalar).

FRECPE: Floating-point Reciprocal Estimate.

FRECPS: Floating-point Reciprocal Step.

FRECPX: Floating-point Reciprocal exponent (scalar).

FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).

FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).

FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).

FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).

FRINTM (scalar): Floating-point Round to Integral, toward Minus infinity (scalar).

FRINTM (vector): Floating-point Round to Integral, toward Minus infinity (vector).

FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).

FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).

FRINTP (scalar): Floating-point Round to Integral, toward Plus infinity (scalar).

FRINTP (vector): Floating-point Round to Integral, toward Plus infinity (vector).

FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).

FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).

FRINTZ (scalar): Floating-point Round to Integral, toward Zero (scalar).

FRINTZ (vector): Floating-point Round to Integral, toward Zero (vector).

FRSQRTE: Floating-point Reciprocal Square Root Estimate.

FRSQRTS: Floating-point Reciprocal Square Root Step.

FSQRT (scalar): Floating-point Square Root (scalar).

FSQRT (vector): Floating-point Square Root (vector).

FSUB (scalar): Floating-point Subtract (scalar).

FSUB (vector): Floating-point Subtract (vector).

加密算法

AESD: AES single round decryption.

AESE: AES single round encryption.

AESIMC: AES inverse mix columns.

AESMC: AES mix columns.

SHA1C: SHA1 hash update (choose).

SHA1H: SHA1 fixed rotate.

SHA1M: SHA1 hash update (majority).

SHA1P: SHA1 hash update (parity).

SHA1SU0: SHA1 schedule update 0.

SHA1SU1: SHA1 schedule update 1.

SHA256H: SHA256 hash update (part 1).

SHA256H2: SHA256 hash update (part 2).

SHA256SU0: SHA256 schedule update 0.

SHA256SU1: SHA256 schedule update 1.

SHA512H: SHA512 Hash update part 1.

SHA512H2: SHA512 Hash update part 2.

SHA512SU0: SHA512 Schedule Update 0.

SHA512SU1: SHA512 Schedule Update 1.

SM3PARTW1: SM3PARTW1.

SM3PARTW2: SM3PARTW2.

SM3SS1: SM3SS1.

SM3TT1A: SM3TT1A.

SM3TT1B: SM3TT1B.

SM3TT2A: SM3TT2A.

SM3TT2B: SM3TT2B.

SM4E: SM4 Encode.

SM4EKEY: SM4 Key.

其他指令

LDNP (SIMD&FP): Load Pair of SIMD&FP registers, with Non-temporal hint.

LDP (SIMD&FP): Load Pair of SIMD&FP registers.

LDUR (SIMD&FP): Load SIMD&FP Register (unscaled offset).

STNP (SIMD&FP): Store Pair of SIMD&FP registers, with Non-temporal hint.

STP (SIMD&FP): Store Pair of SIMD&FP registers.

STUR (SIMD&FP): Store SIMD&FP register (unscaled offset).

Armv8 指令集