Armv8 指令集

基礎知識

通用寄存器:r0-r31, 32位寄存器的名稱是w0-w31,64位寄存器的名稱是x0-x31。其中

  • r31:SP|WSP
  • r30:LR
  • r29:FP
  • r19~28 callee preserved[all 64bits need preserved even using ILP32 modle!] # 使用前需保存值,使用完後還原值
  • r18:platform related(inter-procedural state or PIC)
  • r16=IP0 r17=IP1(intra-procedure-call)
  • r9~15 temporal # 臨時寄存器
  • r8:indirect result location # 返回值的地址指針
  • r0~r7 parameter/result # 8個參數寄存器,把前8個參數放都寄存器中,如果超過8個,會壓入棧中,壓棧的順序爲從右向左。

SIMD寄存器:v0-v31。其中

  • v8-v15使用前需保存信息,使用完後還原值。
  • 其餘寄存器隨便使用。

關於指令前綴或者後綴

  • S 表示Signed
  • U 表示Unsigned
  • F 表示Float
  • P 表示Polynomial 或者 寄存器內部組對操作pairwise
  • V 表示Across,即對整個寄存器的操作
  • 2 一般表示高64位(64-128)的操作
  • H 表示半操作,即截取高位
  • N 表示窄化Narrow
  • L 表示寬化Long

A64 – Base Instructions 通用寄存器

A64 – SIMD and Floating-point Instructions SIMD寄存器指令

邏輯運算和比較運算

VAND、VBIC、VEOR、VORN 和 VORR (寄存器)(按位與、位清除、異或、或非以及或(寄存器))

VAND (按位與)、VBIC (位清除)、VEOR (按位異或)、VORN (按位或非)和VORR (按位或)指令在兩個寄存器之間執行按位邏輯運算,並將結果存放到目標寄存器中。

AND (vector): Bitwise AND (vector). 按位與。

BIC (vector, register): Bitwise bit Clear (vector, register). 位清除

EOR (vector): Bitwise Exclusive OR (vector). 異或

EOR3: Three-way Exclusive OR.

ORN (vector): Bitwise inclusive OR NOT (vector). 或非

ORR (vector, register): Bitwise inclusive OR (vector, register). 或(寄存器)

VBIC 和 VORR (立即數)(按位位清除和或(立即數))

VBIC (位清除(立即數))獲取目標向量的每個元素,對其與一個立即數執行按位與求補運算,並將結果返回到目標向量。

VORR (按位或(立即數))獲取目標向量的每個元素,對其與一個立即數執行按位或運算,並將結果返回到目標向量。

BIC (vector, immediate): Bitwise bit Clear (vector, immediate). 按位位清除

ORR (vector, immediate): Bitwise inclusive OR (vector, immediate). 或(立即數)

VBIF、VBIT 和 VBSL (爲 False 時按位插入,爲 True 時按位插入以及按位選擇)

VBIT (爲 True 時按位插入):如果第二個操作數的對應位爲 1,則該指令將第一個操作數中的每一位插入目標中;否則將目標位保持不變。

VBIF (爲 False 時按位插入):如果第二個操作數的對應位爲 0,則該指令將第一個操作數中的每一位插入目標中;否則將目標位保持不變。

VBSL (按位選擇):如果目標的對應位爲 1,則該指令從第一個操作數中選擇目標的每一位;如果目標的對應位爲 0,則從第二個操作數中選擇目標的每一位。

BIF: Bitwise Insert if False. 爲 False 時按位插入

BIT: Bitwise Insert if True. 爲 True 時按位插入

BSL: Bitwise Select. 按位選擇

VCEQ、VCGE、VCGT、VCLE 和 VCLT (比較)

向量比較獲取向量中每個元素的值,並將其與另一個向量中相應元素的值或零進行比較。 如果條件爲 True,則將目標向量中的相應元素全部設置爲 1。 否則,全部設置爲 0。

CMEQ (register): Compare bitwise Equal (vector).

CMEQ (zero): Compare bitwise Equal to zero (vector).

CMGE (register): Compare signed Greater than or Equal (vector).

CMGE (zero): Compare signed Greater than or Equal to zero (vector).

CMGT (register): Compare signed Greater than (vector).

CMGT (zero): Compare signed Greater than zero (vector).

CMHI (register): Compare unsigned Higher (vector).

CMHS (register): Compare unsigned Higher or Same (vector).

CMLE (zero): Compare signed Less than or Equal to zero (vector).

CMLT (zero): Compare signed Less than zero (vector).

VTST (測試位)

VTST (向量測試位)獲取向量中的每個元素,並將其與另一個向量中的相應元素執行按位邏輯“與”運算。 如果結果不爲 0,則將目標向量中的相應元素全部設置爲 1。 否則,全部設置爲 0。

CMTST: Compare bitwise Test bits nonzero (vector). 測試位

其他位操作

XAR: Exclusive OR and Rotate.

BCAX: Bit Clear and XOR.

RAX1: Rotate and Exclusive OR.

RBIT (vector): Reverse Bit order (vector).

通用數據處理指令

VCVT (在定點數或整數與浮點數之間)定點數或整數與浮點數之間的向量轉換。

VCVT (向量轉換)按下列方式之一轉換一個向量中的每個元素,並將結果存放
到目標向量中:

  • 浮點數到整數
  • 整數到浮點數
  • 浮點數到定點數
  • 定點數到浮點數

舍入

  • 整數或定點數到浮點數的轉換使用向最接近的數舍入。
  • 浮點數到整數或定點數的轉換使用向零舍入。

SCVTF (scalar, fixed-point): Signed fixed-point Convert to Floating-point (scalar).

SCVTF (scalar, integer): Signed integer Convert to Floating-point (scalar).

SCVTF (vector, fixed-point): Signed fixed-point Convert to Floating-point (vector).

SCVTF (vector, integer): Signed integer Convert to Floating-point (vector).

UCVTF (scalar, fixed-point): Unsigned fixed-point Convert to Floating-point (scalar).

UCVTF (scalar, integer): Unsigned integer Convert to Floating-point (scalar).

UCVTF (vector, fixed-point): Unsigned fixed-point Convert to Floating-point (vector).

UCVTF (vector, integer): Unsigned integer Convert to Floating-point (vector).

VDUP 將標量複製到向量的所有向量線。

VDUP (向量複製)將標量複製到目標向量的每個元素。 源可以是 NEON 標量或ARM 寄存器。

DUP (element): Duplicate vector element to vector or scalar.

DUP (general): Duplicate general-purpose register to vector.

VEXT 提取。

VEXT (向量提取)從第二個操作數向量的低位和第一個操作數的高位提取 8 位元素,將這些元素連接起來,並將結果存放到目標向量中。

EXT: Extract vector from pair of vectors.

VMOV、VMVN (立即數) 移動和求反移動(立即數)。

VMOV (向量移動)和 VMVN (向量求反移動)(立即數)生成一個立即數,並將結果存放到目標寄存器。

向量移動(寄存器)將源寄存器中的值複製到目標寄存器中。

向量求反移動(寄存器)對源寄存器中每一位的值執行求反運算,並將結果存放到目標寄存器中。

MOV (element): Move vector element to another vector element: an alias of INS (element).

MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).

MOV (scalar): Move vector element to scalar: an alias of DUP (element).

MOV (to general): Move vector element to general-purpose register: an alias of UMOV.

MOV (vector): Move vector: an alias of ORR (vector, register).

MVN: Bitwise NOT (vector): an alias of NOT. 求反移動

NOT: Bitwise NOT (vector).

MVNI: Move inverted Immediate (vector).

VMOVL、V{Q}MOVN、VQMOVUN 移動(寄存器)。

VMOVL (向量長移)獲取雙字向量中的每個元素,用符號或零將其擴展到原長度的兩倍,並將結果存放到四字向量中。

VMOVN (向量窄移)將四字向量中每個元素的最低有效半部複製到雙字向量的相應元素中。

VQMOVN (向量飽和窄移)將操作數向量中的每個元素複製到目標向量的相應元素中。 結果元素是操作數元素寬度的一半,並且會將值飽和到結果寬度。

VQMOVUN (向量飽和窄移,有符號操作數和無符號結果)將操作數向量的每個元素複製到目標向量的相應元素中。 結果元素是操作數元素寬度的一半,並且會將值飽和到結果寬度。

SXTL, SXTL2: Signed extend Long: an alias of SSHLL, SSHLL2.

UXTL, UXTL2: Unsigned extend Long: an alias of USHLL, USHLL2.

XTN, XTN2: Extract Narrow.

SQXTN, SQXTN2: Signed saturating extract Narrow.

SQXTUN, SQXTUN2: Signed saturating extract Unsigned Narrow.

UQXTN, UQXTN2: Unsigned saturating extract Narrow.

通用寄存器和SIMD寄存器交互

INS (element): Insert vector element from another vector element.

INS (general): Insert vector element from general-purpose register.

SMOV: Signed Move vector element to general-purpose register.

UMOV: Unsigned Move vector element to general-purpose register.

VREV 反轉向量中的元素。

VREV16 (向量在半字中反轉)反轉向量每個半字中的 8 位元素的順序,並將結果存放到對應的目標向量中。

VREV32 (向量在字中反轉)反轉向量每個字中的 8 位或 16 位元素的順序,並將結果存放到對應的目標向量中。

VREV64 (向量在雙字中反轉)反轉向量每個雙字中的 8 位、16 位或 32 位元素的順序,並將結果存放到對應的目標向量中。

REV16 (vector): Reverse elements in 16-bit halfwords (vector).

REV32 (vector): Reverse elements in 32-bit words (vector).

REV64: Reverse elements in 64-bit doublewords (vector).

VTBL、VTBX 向量表查找。

VTBL (向量表查找)使用控制向量中的字節索引在表中查找字節值,並生成一個新的向量。 如果索引超出範圍,則返回 0。

VTBX (向量表擴展)的用法與上一指令相同,但索引超出範圍時目標元素將保持不變。

TBL: Table vector Lookup.

TBX: Table vector lookup extension.

VTRN 向量轉置。

VTRN (向量轉置)將其操作數向量的元素視爲 2 x 2 矩陣的元素,並對此類矩陣進行轉置。

TRN1: Transpose vectors (primary).

TRN2: Transpose vectors (secondary).

VUZP、VZIP 向量交叉存取和反向交叉存取。

VZIP (向量壓縮)交叉存取兩個向量的元素。

VUZP (向量解壓縮)反向交叉存取兩個向量的元素。

UZP1: Unzip vectors (primary).

UZP2: Unzip vectors (secondary).

ZIP1: Zip vectors (primary).

ZIP2: Zip vectors (secondary).

移位指令

VSHL、VQSHL、VQSHLU 和 VSHLL (按立即數) 按立即值左移。

VSHL、VQSHL、VQSHLU 和 VSHLL (按立即數)

向量左移(按立即數)指令獲取整數向量中的每個元素,按立即值對其進行左移,並將結果存放到目標向量中。

對於 VSHL (向量左移),每個元素中從左側移出的位將丟失。

對於 VQSHL (向量飽和左移)和 VQSHLU (向量無符號飽和左移),如果發生飽和,則設置粘性 QC 標記(FPSCR 位 [27])。

對於 VSHLL (向量長型左移),將使用符號或零對值進行擴展。

SHL: Shift Left (immediate).

SQSHL (immediate): Signed saturating Shift Left (immediate).

UQSHL (immediate): Unsigned saturating Shift Left (immediate).

SQSHLU: Signed saturating Shift Left Unsigned (immediate).

SHLL, SHLL2: Shift Left Long (by element size).

SSHLL, SSHLL2: Signed Shift Left Long (immediate).

USHLL, USHLL2: Unsigned Shift Left Long (immediate).

V{Q}{R}SHL (按有符號變量) 按有符號變量左移。

V{Q}{R}SHL (按有符號變量)

VSHL (向量按有符號變量左移)獲取一個向量中的每個元素,按另一個向量的相應元素的最低有效字節中的值對其進行移位,並將結果存放到目標向量中。如果移位值爲正數,則該運算爲左移。 否則爲右移。

可以選擇對結果執行飽和或舍入運算,或者同時執行這兩種運算。 如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

SSHL: Signed Shift Left (register).

USHL: Unsigned Shift Left (register).

SQSHL (register): Signed saturating Shift Left (register).

UQSHL (register): Unsigned saturating Shift Left (register).

SRSHL: Signed Rounding Shift Left (register).

URSHL: Unsigned Rounding Shift Left (register).

SQRSHL: Signed saturating Rounding Shift Left (register).

UQRSHL: Unsigned saturating Rounding Shift Left (register).

V{R}SHR{N}、V{R}SRA (按立即數) 按立即值右移。

V{R}SHR{N}、V{R}SRA (按立即數)

V{R}SHR{N} (向量按立即值右移)獲取向量中的每個元素,按立即值對其進行右移,並將結果存放到目標向量中。 可以選擇對結果執行舍入或窄型運算,或者同時執行這兩種運算。

V{R}SRA (向量按立即值右移並累加)獲取向量中的每個元素,按立即值對其進行右移,並將結果累加到目標向量中。 可以選擇對結果進行舍入。

SSHR: Signed Shift Right (immediate).

USHR: Unsigned Shift Right (immediate).

SHRN, SHRN2: Shift Right Narrow (immediate).

SRSHR: Signed Rounding Shift Right (immediate).

URSHR: Unsigned Rounding Shift Right (immediate).

RSHRN, RSHRN2: Rounding Shift Right Narrow (immediate).

SSRA: Signed Shift Right and Accumulate (immediate).

USRA: Unsigned Shift Right and Accumulate (immediate).

SRSRA: Signed Rounding Shift Right and Accumulate (immediate).

URSRA: Unsigned Rounding Shift Right and Accumulate (immediate).

VQ{R}SHR{U}N (按立即數) 按立即值右移並進行飽和。

VQ{R}SHR{U}N (按立即數)

VQ{R}SHR{U}N (向量飽和右移、窄型、按立即值,可選舍入)獲取整數四字向量中的每個元素,按立即值對其進行右移,並將結果存放到雙字向量中。

如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

SQSHRN, SQSHRN2: Signed saturating Shift Right Narrow (immediate).

UQSHRN, UQSHRN2: Unsigned saturating Shift Right Narrow (immediate).

SQRSHRN, SQRSHRN2: Signed saturating Rounded Shift Right Narrow (immediate).

UQRSHRN, UQRSHRN2: Unsigned saturating Rounded Shift Right Narrow (immediate).

SQRSHRUN, SQRSHRUN2: Signed saturating Rounded Shift Right Unsigned Narrow (immediate).

SQSHRUN, SQSHRUN2: Signed saturating Shift Right Unsigned Narrow (immediate).

VSLI 和 VSRI 左移並插入,右移並插入。

VSLI (向量左移並插入)獲取向量中的每個元素,按立即值對其進行左移,並將結果插入目標向量中。 每個元素中從左側移出的位將丟失。

VSRI (向量右移並插入)獲取向量中的每個元素,按立即值對其進行右移,並將結果插入目標向量中。 每個元素中從最右側移出的位將丟失。

SLI: Shift Left and Insert (immediate).

SRI: Shift Right and Insert (immediate).

通用算術指令

VABA{L} 和 VABD{L} 向量差值絕對值累加和差值絕對值。

VABA (向量差值絕對值累加)用一個向量的元素減去另一個向量的相應元素,並將結果的絕對值累加到目標向量的元素中。

VABD (向量差值絕對值)用一個向量的元素減去另一個向量的相應元素,並將結果的絕對值存放到目標向量的元素中。

這兩個指令的長型格式都可用。

SABA: Signed Absolute difference and Accumulate.

SABAL, SABAL2: Signed Absolute difference and Accumulate Long.

UABA: Unsigned Absolute difference and Accumulate.

UABAL, UABAL2: Unsigned Absolute difference and Accumulate Long.

SABD: Signed Absolute Difference.

SABDL, SABDL2: Signed Absolute Difference Long.

UABD: Unsigned Absolute Difference (vector).

UABDL, UABDL2: Unsigned Absolute Difference Long.

V{Q}ABS 和 V{Q}NEG 向量絕對值和求反。

VABS (向量絕對值)獲取一個向量中每個元素的絕對值,並將結果存放到另一個向量中。 (對於浮點格式,僅清除符號位。)

VNEG (向量求反)對一個向量中的每個元素執行求反運算,並將結果存放到另一個向量中。 (對於浮點格式,僅反轉符號位。)

這兩個指令的飽和格式都可用。 如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

ABS: Absolute value (vector).

SQABS: Signed saturating Absolute value.

NEG (vector): Negate (vector).

SQNEG: Signed saturating Negate.

V{Q}ADD、VADDL、VADDW、V{Q}SUB、VSUBL 和 VSUBW 向量加法和減法。

VADD (向量加法)將兩個向量中的相應元素相加,並將結果存放到目標向量中。

VSUB (向量減法)用一個向量的元素減去另一個向量的相應元素,並將結果存放到目標向量中。

飽和、長型和寬型格式都可用。 如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

ADD (vector): Add (vector).

SQADD: Signed saturating Add.

UQADD: Unsigned saturating Add.

SADDL, SADDL2: Signed Add Long (vector).

UADDL, UADDL2: Unsigned Add Long (vector).

SADDW, SADDW2: Signed Add Wide.

UADDW, UADDW2: Unsigned Add Wide.

SUB (vector): Subtract (vector).

SQSUB: Signed saturating Subtract.

UQSUB: Unsigned saturating Subtract.

SSUBL, SSUBL2: Signed Subtract Long.

USUBL, USUBL2: Unsigned Subtract Long.

SSUBW, SSUBW2: Signed Subtract Wide.

USUBW, USUBW2: Unsigned Subtract Wide.

V{R}ADDHN 和 V{R}SUBHN 選擇高半部分的向量加法和選擇高半部分的向量減法。

V{R}ADDHN 和 V{R}SUBHN

V{R}ADDH (向量窄型加法,選擇高半部分)將兩個向量中的相應元素相加,選擇相加結果的最高有效半部,並將最終結果存放到目標向量中。 可將結果舍入或截斷。

V{R}SUBH (向量窄型減法,選擇高半部分)用一個向量的元素減去另一個向量的相應元素,選擇相減結果的最高有效半部,並將最終結果存放到目標向量中。 可將結果舍入或截斷。

ADDHN, ADDHN2: Add returning High Narrow.

RADDHN, RADDHN2: Rounding Add returning High Narrow.

SUBHN, SUBHN2: Subtract returning High Narrow.

RSUBHN, RSUBHN2: Rounding Subtract returning High Narrow.

V{R}HADD 和 VHSUB 向量半加和半減。

VHADD (向量半加)將兩個向量中的相應元素相加,將每個結果右移一位,並將這些結果存放到目標向量中。 可將結果舍入或截斷。

VHSUB (向量半減)用一個向量的元素減去另一個向量的相應元素,將每個結果右移一位,並將這些結果存放到目標向量中。 結果將總是被截斷。

SHADD: Signed Halving Add.

UHADD: Unsigned Halving Add.

SRHADD: Signed Rounding Halving Add.

URHADD: Unsigned Rounding Halving Add.

SHSUB: Signed Halving Subtract.

UHSUB: Unsigned Halving Subtract.

VPADD{L}、VPADAL 向量按對加,向量按對加並累加。

VPADD (向量按對加)將兩個向量的相鄰元素對相加,並將結果存放到目標向量中。

VPADDL (向量長型按對加)將向量中相鄰的元素對相加,用符號或零將結果擴展爲原寬度的兩倍,並將最終結果存放到目標向量中。

VPADAL (向量長型按對加累加)將向量中相鄰的元素對相加,並將結果的絕對值累加到目標向量的元素中。

ADDP (scalar): Add Pair of elements (scalar).

ADDP (vector): Add Pairwise (vector).

SADDLP: Signed Add Long Pairwise.

UADDLP: Unsigned Add Long Pairwise.

SADALP: Signed Add and Accumulate Long Pairwise.

UADALP: Unsigned Add and Accumulate Long Pairwise.

無符號、有符號加

SUQADD: Signed saturating Accumulate of Unsigned value.

USQADD: Unsigned saturating Accumulate of Signed value.

VMAX、VMIN、VPMAX 和 VPMIN 向量最大值,向量最小值,向量按對最大值和向量按對最小值。

VMAX (向量最大值)對兩個向量中的相應元素進行比較,並將每一對中的較大值複製到目標向量的相應元素中。

VMIN (向量最小值)對兩個向量中的相應元素進行比較,並將每一對中的較小值複製到目標向量的相應元素中。

VPMAX (向量按對最大值)對兩個向量中的相鄰元素對進行比較,並將每一對中的較大值複製到目標向量的相應元素中。 操作數和結果必須爲雙字向量。

VPMIN (向量按對最小值)對兩個向量中的相鄰元素對進行比較,並將每一對中的較小值複製到目標向量的相應元素中。 操作數和結果必須爲雙字向量。

有關按對運算的圖示,請參閱第5-63 頁的圖5-5。

浮點最大值和最小值:max(+0.0, –0.0) = +0.0,min(+0.0, –0.0) = –0.0
如果任意輸入爲非數字,則對應的結果元素爲缺省非數字。

SMAX: Signed Maximum (vector).

UMAX: Unsigned Maximum (vector).

SMIN: Signed Minimum (vector).

UMIN: Unsigned Minimum (vector).

SMAXP: Signed Maximum Pairwise.

UMAXP: Unsigned Maximum Pairwise.

SMINP: Signed Minimum Pairwise.

UMINP: Unsigned Minimum Pairwise.

V操作

求得向量中的總和、最值

ADDV: Add across Vector.

SADDLV: Signed Add Long across Vector.

UADDLV: Unsigned sum Long across Vector.

SMAXV: Signed Maximum across Vector.

UMAXV: Unsigned Maximum across Vector.

SMINV: Signed Minimum across Vector.

UMINV: Unsigned Minimum across Vector.

VCLS、VCLZ 和 VCNT 向量前導符號位計數,前導零計數和設置位計數。

VCLS (向量前導符號位計數)計算一個向量的每個元素中最高位後面與最高位相同的連續位數目,並將結果存放到另一個向量中。

VCLZ (向量前導零計數)計算一個向量的每個元素中從最高位開始算起的連續零數目,並將結果存放到另一個向量中。

VCNT (向量設置位計數)計算一個向量的每個元素中值爲 1 的位的數目,並將結果存放到另一個向量中。

CLS (vector): Count Leading Sign bits (vector).

CLZ (vector): Count Leading Zero bits (vector).

CNT: Population Count per byte.

VRECPE 和 VRSQRTE 向量近似倒數和近似平方根倒數。

VRECPE (向量近似倒數)求出一個向量中每個元素的近似倒數,並將結果存放到另一個向量中。

VRSQRTE (向量近似平方根倒數)求出一個向量中每個元素的近似平方根倒數,並將結果存放到另一個向量中。

URECPE: Unsigned Reciprocal Estimate.

URSQRTE: Unsigned Reciprocal Square Root Estimate.

乘法指令

VMUL{L}、VMLA{L} 和 VMLS{L} 向量乘法、向量乘加和向量乘減。

VMUL (向量乘法)將兩個向量中的相應元素相乘,並將結果存放到目標向量中。

VMLA (向量乘加)將兩個向量中的相應元素相乘,並將結果累加到目標向量的元素中。

VMLS (向量乘減)將兩個向量中的相應元素相乘,從目標向量的相應元素中減去相乘的結果,並將最終結果放入目標向量中。

MUL (vector): Multiply (vector).

SMULL, SMULL2 (vector): Signed Multiply Long (vector).

UMULL, UMULL2 (vector): Unsigned Multiply long (vector).

MLA (vector): Multiply-Add to accumulator (vector).

SMLAL, SMLAL2 (vector): Signed Multiply-Add Long (vector).

UMLAL, UMLAL2 (vector): Unsigned Multiply-Add Long (vector).

MLS (vector): Multiply-Subtract from accumulator (vector).

SMLSL, SMLSL2 (vector): Signed Multiply-Subtract Long (vector).

UMLSL, UMLSL2 (vector): Unsigned Multiply-Subtract Long (vector).

VMUL{L}、VMLA{L} 和 VMLS{L} (按標量) 向量乘法、向量乘加和向量乘減(按標量)。

VMUL (向量乘以標量)將向量中的每個元素乘以標量,並將結果放入目標向量中。

VMLA (向量乘加)將向量中的每個元素乘以標量,並將結果累加到目標向量的相應元素中。

VMLS (向量乘減)將向量中的每個元素乘以標量,然後從目標向量的相應元素中減去相乘的結果,並將最終結果放入目標向量中。

MUL (by element): Multiply (vector, by element).

SMULL, SMULL2 (by element): Signed Multiply Long (vector, by element).

UMULL, UMULL2 (by element): Unsigned Multiply Long (vector, by element).

MLA (by element): Multiply-Add to accumulator (vector, by element).

SMLAL, SMLAL2 (by element): Signed Multiply-Add Long (vector, by element).

UMLAL, UMLAL2 (by element): Unsigned Multiply-Add Long (vector, by element).

MLS (by element): Multiply-Subtract from accumulator (vector, by element).

SMLSL, SMLSL2 (by element): Signed Multiply-Subtract Long (vector, by element).

UMLSL, UMLSL2 (by element): Unsigned Multiply-Subtract Long (vector, by element).

VQDMULL、VQDMLAL 和 VQDMLSL (按向量或標量) 向量飽和加倍乘法、向量乘加和向量乘減(按向量或標量)

向量飽和加倍乘法指令將其操作數相乘並將結果加倍。VQDMULL 將結果存放到目標寄存器中。VQDMLAL 將結果與目標寄存器中的值相加。VQDMLSL 用目標寄存器中的值減去結果。

如果任意結果溢出,則會對其進行飽和。 如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).

SQDMULL, SQDMULL2 (vector): Signed saturating Doubling Multiply Long.

SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).

SQDMLAL, SQDMLAL2 (vector): Signed saturating Doubling Multiply-Add Long.

SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).

SQDMLSL, SQDMLSL2 (vector): Signed saturating Doubling Multiply-Subtract Long.

VQ{R}DMULH (按向量或標量) 返回高半部分的向量飽和加倍乘法(按向量或標量)。

向量飽和加倍乘法指令將其操作數相乘並將結果加倍。 此類指令僅返回結果的高半部分。

如果任意結果溢出,則會對其進行飽和。 如果發生飽和,則會設置粘性 QC 標記(FPSCR 位 [27])。

SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).

SQDMULH (vector): Signed saturating Doubling Multiply returning High half.

SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).

SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.

SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element).

SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

多項式乘法

PMUL: Polynomial Multiply.

PMULL, PMULL2: Polynomial Multiply Long.

點積Dot

SDOT (by element): Dot Product signed arithmetic (vector, by element).

SDOT (vector): Dot Product signed arithmetic (vector).

UDOT (by element): Dot Product unsigned arithmetic (vector, by element).

UDOT (vector): Dot Product unsigned arithmetic (vector).

加載/存儲

VLDn 和 VSTn (單個 n 元素結構到一條向量線) 此類指令幾乎可用於所有數據訪問。 可加載標準向量 (n = 1)。

向量加載單個 n 元素結構到一條向量線。 它將一個 n 元素結構從內存加載到一個或多個 NEON 寄存器。 未加載的寄存器元素將保持不變。

向量存儲單個 n 元素結構到一條向量線。 它將一個 n 元素結構從一個或多個NEON 寄存器存儲到內存中。

LD1 (single structure): Load one single-element structure to one lane of one register.

LD2 (single structure): Load single 2-element structure to one lane of two registers.

LD3 (single structure): Load single 3-element structure to one lane of three registers).

LD4 (single structure): Load single 4-element structure to one lane of four registers.

ST1 (single structure): Store a single-element structure from one lane of one register.

ST2 (single structure): Store single 2-element structure from one lane of two registers.

ST3 (single structure): Store single 3-element structure from one lane of three registers.

ST4 (single structure): Store single 4-element structure from one lane of four registers.

VLDn (單個 n 元素結構到所有向量線)

向量加載單個 n 元素結構到所有向量線。 它將一個 n 元素結構的多個副本從內存加載到一個或多個 NEON 寄存器。

LD1R: Load one single-element structure and Replicate to all lanes (of one register).

LD2R: Load single 2-element structure and Replicate to all lanes of two registers.

LD3R: Load single 3-element structure and Replicate to all lanes of three registers.

LD4R: Load single 4-element structure and Replicate to all lanes of four registers.

VLDn 和 VSTn (多個 n 元素結構)

向量加載多個 n 元素結構。 它使用反向交叉存取功能,將多個 n 元素結構從內存加載到一個或多個 NEON 寄存器中(除非 n == 1)。 會加載每個寄存器的每個元素。

向量存儲多個 n 元素結構。 它使用交叉存取功能,將多個 n 元素結構從一個或多個 NEON 寄存器存儲到內存中(除非 n == 1)。 會存儲每個寄存器的每個元素。

LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.

LD2 (multiple structures): Load multiple 2-element structures to two registers.

LD3 (multiple structures): Load multiple 3-element structures to three registers.

LD4 (multiple structures): Load multiple 4-element structures to four registers.

ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.

ST2 (multiple structures): Store multiple 2-element structures from two registers.

ST3 (multiple structures): Store multiple 3-element structures from three registers.

ST4 (multiple structures): Store multiple 4-element structures from four registers.

NEON 和 VFP 僞指令

VLDR 僞指令(NEON 和 VFP)

VLDR 僞指令將一個常數值加載到 64 位 NEON 向量的每個元素,或者加載到 VFP單精度或雙精度寄存器。

如果某一指令(如 VMOV)可用於直接將常數生成到寄存器中,則彙編器將使用該指令。 否則,彙編器生成一個包含常數的雙字文字池條目,並使用 VLDR 指令加載該常數。

LDR (literal, SIMD&FP): Load SIMD&FP Register (PC-relative literal).

VLDR 和 VSTR (後增量和前增量)(NEON 和 VFP)

使用後增量和前增量加載或存儲擴展寄存器的僞指令。

有關不使用後增量和前增量的 VLDR 和 VSTR 指令的信息,請參閱第5-23 頁的 VLDR 和 VSTR。

後增量指令在傳送後按偏移量的值遞增寄存器中的基址。 前增量指令按偏移量的值遞減寄存器中的基址,然後使用寄存器中的新地址執行傳送。 這些僞指令彙編爲 VLDM 或 VSTM 指令(請參閱第5-24 頁的VLDM、VSTM、VPOP 和VPUSH)。

LDR (immediate, SIMD&FP): Load SIMD&FP Register (immediate offset).

LDR (register, SIMD&FP): Load SIMD&FP Register (register offset).

STR (immediate, SIMD&FP): Store SIMD&FP register (immediate offset).

STR (register, SIMD&FP): Store SIMD&FP register (register offset).

VMOV2 (僅限 NEON)

VMOV2 僞指令生成一個常數並將其存放到 NEON 向量的每個元素中,而不從文字池中加載值。 它始終正好彙編爲兩個指令。

VMOV2 可生成任何 16 位常數,以及限定範圍的 32 位和 64 位常數。

VMOV2 通常彙編爲 VMOV 或 VMVN 指令,後跟 VBIC 或 VORR 指令。 有關詳細信息,請參閱第5-44 頁的VMOV、VMVN(立即數)和第5-32 頁的VBIC 和 VORR (立即數)。

MOVI: Move Immediate (vector).

浮點運算

FABD: Floating-point Absolute Difference (vector).

FABS (scalar): Floating-point Absolute value (scalar).

FABS (vector): Floating-point Absolute value (vector).

FACGE: Floating-point Absolute Compare Greater than or Equal (vector).

FACGT: Floating-point Absolute Compare Greater than (vector).

FADD (scalar): Floating-point Add (scalar).

FADD (vector): Floating-point Add (vector).

FADDP (scalar): Floating-point Add Pair of elements (scalar).

FADDP (vector): Floating-point Add Pairwise (vector).

FCADD: Floating-point Complex Add.

FCCMP: Floating-point Conditional quiet Compare (scalar).

FCCMPE: Floating-point Conditional signaling Compare (scalar).

FCMEQ (register): Floating-point Compare Equal (vector).

FCMEQ (zero): Floating-point Compare Equal to zero (vector).

FCMGE (register): Floating-point Compare Greater than or Equal (vector).

FCMGE (zero): Floating-point Compare Greater than or Equal to zero (vector).

FCMGT (register): Floating-point Compare Greater than (vector).

FCMGT (zero): Floating-point Compare Greater than zero (vector).

FCMLA: Floating-point Complex Multiply Accumulate.

FCMLA (by element): Floating-point Complex Multiply Accumulate (by element).

FCMLE (zero): Floating-point Compare Less than or Equal to zero (vector).

FCMLT (zero): Floating-point Compare Less than zero (vector).

FCMP: Floating-point quiet Compare (scalar).

FCMPE: Floating-point signaling Compare (scalar).

FCSEL: Floating-point Conditional Select (scalar).

FCVT: Floating-point Convert precision (scalar).

FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).

FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).

FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

FCVTL, FCVTL2: Floating-point Convert to higher precision Long (vector).

FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).

FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).

FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

FCVTN, FCVTN2: Floating-point Convert to lower precision Narrow (vector).

FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).

FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).

FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).

FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).

FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).

FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).

FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).

FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).

FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).

FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).

FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

FDIV (scalar): Floating-point Divide (scalar).

FDIV (vector): Floating-point Divide (vector).

FJCVTZS: Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.

FMADD: Floating-point fused Multiply-Add (scalar).

FMAX (scalar): Floating-point Maximum (scalar).

FMAX (vector): Floating-point Maximum (vector).

FMAXNM (scalar): Floating-point Maximum Number (scalar).

FMAXNM (vector): Floating-point Maximum Number (vector).

FMAXNMP (scalar): Floating-point Maximum Number of Pair of elements (scalar).

FMAXNMP (vector): Floating-point Maximum Number Pairwise (vector).

FMAXNMV: Floating-point Maximum Number across Vector.

FMAXP (scalar): Floating-point Maximum of Pair of elements (scalar).

FMAXP (vector): Floating-point Maximum Pairwise (vector).

FMAXV: Floating-point Maximum across Vector.

FMIN (scalar): Floating-point Minimum (scalar).

FMIN (vector): Floating-point minimum (vector).

FMINNM (scalar): Floating-point Minimum Number (scalar).

FMINNM (vector): Floating-point Minimum Number (vector).

FMINNMP (scalar): Floating-point Minimum Number of Pair of elements (scalar).

FMINNMP (vector): Floating-point Minimum Number Pairwise (vector).

FMINNMV: Floating-point Minimum Number across Vector.

FMINP (scalar): Floating-point Minimum of Pair of elements (scalar).

FMINP (vector): Floating-point Minimum Pairwise (vector).

FMINV: Floating-point Minimum across Vector.

FMLA (by element): Floating-point fused Multiply-Add to accumulator (by element).

FMLA (vector): Floating-point fused Multiply-Add to accumulator (vector).

FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).

FMLAL, FMLAL2 (vector): Floating-point fused Multiply-Add Long to accumulator (vector).

FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).

FMLS (vector): Floating-point fused Multiply-Subtract from accumulator (vector).

FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).

FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).

FMOV (general): Floating-point Move to or from general-purpose register without conversion.

FMOV (register): Floating-point Move register without conversion.

FMOV (scalar, immediate): Floating-point move immediate (scalar).

FMOV (vector, immediate): Floating-point move immediate (vector).

FMSUB: Floating-point Fused Multiply-Subtract (scalar).

FMUL (by element): Floating-point Multiply (by element).

FMUL (scalar): Floating-point Multiply (scalar).

FMUL (vector): Floating-point Multiply (vector).

FMULX: Floating-point Multiply extended.

FMULX (by element): Floating-point Multiply extended (by element).

FNEG (scalar): Floating-point Negate (scalar).

FNEG (vector): Floating-point Negate (vector).

FNMADD: Floating-point Negated fused Multiply-Add (scalar).

FNMSUB: Floating-point Negated fused Multiply-Subtract (scalar).

FNMUL (scalar): Floating-point Multiply-Negate (scalar).

FRECPE: Floating-point Reciprocal Estimate.

FRECPS: Floating-point Reciprocal Step.

FRECPX: Floating-point Reciprocal exponent (scalar).

FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).

FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).

FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).

FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).

FRINTM (scalar): Floating-point Round to Integral, toward Minus infinity (scalar).

FRINTM (vector): Floating-point Round to Integral, toward Minus infinity (vector).

FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).

FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).

FRINTP (scalar): Floating-point Round to Integral, toward Plus infinity (scalar).

FRINTP (vector): Floating-point Round to Integral, toward Plus infinity (vector).

FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).

FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).

FRINTZ (scalar): Floating-point Round to Integral, toward Zero (scalar).

FRINTZ (vector): Floating-point Round to Integral, toward Zero (vector).

FRSQRTE: Floating-point Reciprocal Square Root Estimate.

FRSQRTS: Floating-point Reciprocal Square Root Step.

FSQRT (scalar): Floating-point Square Root (scalar).

FSQRT (vector): Floating-point Square Root (vector).

FSUB (scalar): Floating-point Subtract (scalar).

FSUB (vector): Floating-point Subtract (vector).

加密算法

AESD: AES single round decryption.

AESE: AES single round encryption.

AESIMC: AES inverse mix columns.

AESMC: AES mix columns.

SHA1C: SHA1 hash update (choose).

SHA1H: SHA1 fixed rotate.

SHA1M: SHA1 hash update (majority).

SHA1P: SHA1 hash update (parity).

SHA1SU0: SHA1 schedule update 0.

SHA1SU1: SHA1 schedule update 1.

SHA256H: SHA256 hash update (part 1).

SHA256H2: SHA256 hash update (part 2).

SHA256SU0: SHA256 schedule update 0.

SHA256SU1: SHA256 schedule update 1.

SHA512H: SHA512 Hash update part 1.

SHA512H2: SHA512 Hash update part 2.

SHA512SU0: SHA512 Schedule Update 0.

SHA512SU1: SHA512 Schedule Update 1.

SM3PARTW1: SM3PARTW1.

SM3PARTW2: SM3PARTW2.

SM3SS1: SM3SS1.

SM3TT1A: SM3TT1A.

SM3TT1B: SM3TT1B.

SM3TT2A: SM3TT2A.

SM3TT2B: SM3TT2B.

SM4E: SM4 Encode.

SM4EKEY: SM4 Key.

其他指令

LDNP (SIMD&FP): Load Pair of SIMD&FP registers, with Non-temporal hint.

LDP (SIMD&FP): Load Pair of SIMD&FP registers.

LDUR (SIMD&FP): Load SIMD&FP Register (unscaled offset).

STNP (SIMD&FP): Store Pair of SIMD&FP registers, with Non-temporal hint.

STP (SIMD&FP): Store Pair of SIMD&FP registers.

STUR (SIMD&FP): Store SIMD&FP register (unscaled offset).

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章