NEON Intrinsics

原創

wxb_blog

2018-09-02 22:17

本篇博客將對NEON Intrinsic進行介紹，同時末尾會給出幾個example。如有謬誤，煩請指出，謝謝。

Introduction

NEON Intrinsics是一種比彙編更高級的API，可以直接在C/C++中進行調用。使用匯編，我們可以對更多的硬件執行細節進行控制，但是彙編相對來說維護代價較高。相比之下，NEON Intrinsic易於維護，同時也可以對硬件執行細節進行一定程度的控制。編譯器會將NEON Intrinsic調用替換成對應的NEON指令。NEON Intrinsic定義在arm_neon.h中，包含一些函數和數據類型。

Data Type

NEON vector數據類型的命名規則如下：
<type><size>x<number_of_lanes>_t
例如：int16x4_t 定義一個包含4個元素的short類型vector
NEON 支持vector數組，規則如下：
<type><size>x<number_of_lanes>x<length_of_array>_t
例如：int16x4x2_t指由兩個int16x4_t類型組成的數組。

NEON Intrinsics

NEON Intrinsic明明規則如下：
<opname><flags>_<type>
例如：
- vmul_s16，兩個s16類型的vector相乘
- vadd_u8, add two 64-bit vectors containing unsigned 8-bit and return 64-bit vectors, doubleword registers
- vaddq_u8, add two 128-bit vectors containing unsigned 8-bit and return 128-bit vectors, quadword registers
- vaddl_u8, vector add long unsigned 8-bit, add two 64-bit vectors containing unsigned 8-bit and return 128-bit vectors containing unsigned 16-bit

Variabls and constants

uint32x2_t vec64a, vec64b;              // create two D-register variables
uint8x8_t start_value = vdup_n_u8(0);   // create D-register with initial value 0
result = vget_lane_u32(vec64a, 0);      // extract lane 0
vec64a = vget_low_u32(vec128);          // split 128-bit vector into 2x 64-bit vectors
vec64b = vget_high_u32(vec128);

Load and Store

unsigned short int A[] = {1,2,3,4};
uint16x4_t v;           // declare a vector of four 16-bit lanes
v = vld1_u16(A);        // vector load 1-way unsigned 16-bit
vst1_u16(A, v);         // vector store 1-way unsigned 16-bit

uint8x8x3_t v;          // This represents 3 vectors, each vector has eight lanes of 8-bit data
unsigned char A[24];    // This array represents a 24-bit RGB image
v = vld3_u8(A);         // vector load 3-way unsigned 8-bit
vst3_u8(A, v);          // vector store 3-way unsigned 8-bit

Example

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

NEON Intrinsics

Introduction

Data Type

NEON Intrinsics

Variabls and constants

Load and Store

Example

如何使用 JS 判斷用戶是否處於活躍狀態

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

Win8.1 + Dev C++

Time-Delay Neural Network(TDNN)-上

"Unhandled exception in app.exe (QtGuid4.dll): 0xC0000005: Access Violation"錯誤解決

Kaldi-Timit 訓練

Time-Delay Neural Network(TDNN)-下

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結