- 本篇博客將對NEON Intrinsic進行介紹,同時末尾會給出幾個example。如有謬誤,煩請指出,謝謝。
Introduction
- NEON Intrinsics是一種比彙編更高級的API,可以直接在C/C++中進行調用。使用匯編,我們可以對更多的硬件執行細節進行控制,但是彙編相對來說維護代價較高。相比之下,NEON Intrinsic易於維護,同時也可以對硬件執行細節進行一定程度的控制。編譯器會將NEON Intrinsic調用替換成對應的NEON指令。NEON Intrinsic定義在arm_neon.h中,包含一些函數和數據類型。
Data Type
- NEON vector數據類型的命名規則如下:
<type><size>x<number_of_lanes>_t
例如:int16x4_t 定義一個包含4個元素的short類型vector
- NEON 支持vector數組,規則如下:
<type><size>x<number_of_lanes>x<length_of_array>_t
例如:int16x4x2_t指由兩個int16x4_t類型組成的數組。
NEON Intrinsics
- NEON Intrinsic明明規則如下:
<opname><flags>_<type>
例如:
- vmul_s16,兩個s16類型的vector相乘
- vadd_u8, add two 64-bit vectors containing unsigned 8-bit and return 64-bit vectors, doubleword registers
- vaddq_u8, add two 128-bit vectors containing unsigned 8-bit and return 128-bit vectors, quadword registers
- vaddl_u8, vector add long unsigned 8-bit, add two 64-bit vectors containing unsigned 8-bit and return 128-bit vectors containing unsigned 16-bit
Variabls and constants
uint32x2_t vec64a, vec64b; // create two D-register variables
uint8x8_t start_value = vdup_n_u8(0); // create D-register with initial value 0
result = vget_lane_u32(vec64a, 0); // extract lane 0
vec64a = vget_low_u32(vec128); // split 128-bit vector into 2x 64-bit vectors
vec64b = vget_high_u32(vec128);
Load and Store
unsigned short int A[] = {1,2,3,4};
uint16x4_t v;
v = vld1_u16(A);
vst1_u16(A, v);
uint8x8x3_t v;
unsigned char A[24];
v = vld3_u8(A);
vst3_u8(A, v);
Example
- neon_intrinsics.c
- neon_load.c
- neon_interleave.c