1 前言
前面文章導航:
ZCU106 XRT環境搭建
ZCU106 XRT Vivado工程分析
ZCU106 XRT PetaLinux工程分析
【XRT Vitis-Tutorials】RTL Kernels測試
官方文檔:
2019.2 Vitis™ Application Acceleration Development Flow Tutorials
Vitis Unified Software Platform Documentation Application Acceleration Development
Vitis Unified Software Platform Documentation Embedded Software Development
Vitis ZCU106 Platform
ZCU106 Vitis Platform
pre-built,直接下載並複製到SD卡即可測試:
ZCU106 Test Image
2 創建Vitis工程
本篇文章來測試Tutorials中的第2個例子:Mixing C++ and RTL Kernels
該例子中進行了兩個步驟的實驗,分別是sw_emu和hw_emu,我這變還是會繼續在硬件上直接測試。
本例子將會進行如下測試:
- 使用sw_emu仿真方法,測試C++ Kernel的功能
- 使用hw_emu仿真方法,測試Mixing C++ and RTL Kernels的功能
- 使用hardware實測方法,板上驗證硬件加速功能
2.1 工程創建
例子中的測試方法是使用腳本 run_sprite_mixing_c_rtl_kernels.sh 來進行Vitis工程創建和編譯的,我爲了方便還是使用GUI的方法吧。
2.1 新建工程
在Vitis中創建一個新的Application Project,平臺選擇zcu106vcu_base。
2.2 添加源代碼
我們將需要編譯的內容直接添加到src目錄下,包括:
- C++ Kernel:krnl_vadd.cpp
- RTL Kernel:rtl_kernel_wizard_0.xo
- Host APP:host_step2.cpp(直接使用host_step2可以測試混合Kernel功能)
接着我們要將編譯平臺選擇到Hardware,將C++/RTL Kernels添加到Hardware Functions內進行加速。
最終的工程目錄結構如下圖:
2.3 代碼分析
host_step2.cpp
該例子中的主要功能如下:
- 先使用C++ Kernel:krnl_vector_add 進行 c=a+b的硬件加速
- 再使用RTL Kernel:krnl_const_add 進行d=c+1的硬件加速
//set the kernel Arguments
krnl_vector_add.setArg(0,buffer_a);
krnl_vector_add.setArg(1,buffer_b);
krnl_vector_add.setArg(2,buffer_result);
krnl_vector_add.setArg(3,DATA_SIZE);
krnl_const_add.setArg(0,buffer_result);
//Launch the Kernel
q.enqueueTask(krnl_vector_add);
q.enqueueTask(krnl_const_add);
C++ Kernel
C++ Kernel的源碼如下:
//------------------------------------------------------------------------------
//
// kernel: vadd
//
// Purpose: Demonstrate Vector Add in OpenCL
//
#define BUFFER_SIZE 256
extern "C" {
void krnl_vadd(
int* a,
int* b,
int* c,
const int n_elements)
{
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=a max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=b max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem1 port=c max_write_burst_length = 256
#pragma HLS INTERFACE s_axilite port=a bundle=control
#pragma HLS INTERFACE s_axilite port=b bundle=control
#pragma HLS INTERFACE s_axilite port=c bundle=control
#pragma HLS INTERFACE s_axilite port=n_elements bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control
int arrayA[BUFFER_SIZE];
int arrayB[BUFFER_SIZE];
for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
{
int size = BUFFER_SIZE;
//boundary check
if (i + size > n_elements) size = n_elements - i;
//Burst reading A and B
readA: for (int j = 0 ; j < size ; j++) {
#pragma HLS pipeline ii = 1 rewind
arrayA[j] = a[i+j];
arrayB[j] = b[i+j];
}
//Burst reading B and calculating C and Burst writing
// to Global memory
vadd_wrteC: for (int j = 0 ; j < size ; j++){
#pragma HLS pipeline ii = 1 rewind
c[i+j] = arrayA[j] + arrayB[j];
}
}
}
}
可以看出,其實這就是一個Vivado HLS的代碼。功能包括:
- 兩個AXI_Master接口用於讀寫數據:gmem,gmem1
- 一個AXI_Slave接口用於4個寄存器的配置:包括a,b,c數據地址,n_elements的可配之參數
- 實際的IP功能,實現c=a+b,藉助HLS pipeline定義實現了流水線處理
RTL Kernel
RTL的功能可以解壓一下rtl_kernel_wizard_0.xo,然後看其中的邏輯代碼,就是實現了一個數據+1的功能。
同時可以看到rtl_kernel_wizard_0.xo能夠被展開,其中包含了一個名爲rtl_kernel_wizard_0_cmodel.cpp的C Model測試代碼。該C Model可以被用於進行仿真驗證測試,因爲仿真時候是沒有RTL實體的。
2.2 仿真測試
其中仿真的兩個測試方法我就不進行說明了,按照Tutorials的說明來做即可。
2.2.1 Review the Application Timeline
在該例子中還使用了Vitis的新工具Vitis Analyzer進行了仿真時序的查看,我還沒仔細研究,看起來是一個不錯的工具,可以觀察數據處理的流程和Kernel的運行時間。
2.3 上板測試
2.3.1 編譯
該工程中直接選擇Hardware進行編譯即可,mixing_container與上一次的只有一個的Kernel有所區別。
如下圖:
mixing_container中包含了兩個Kernels,一個是RTL Kernel,一個是C++ Kernel,這也就是本篇的核心功能。
點擊Build進行編譯
2.3.2 Vivado工程
編譯完成後,可以打開Vivado查看一下內部的結構。如下圖:
krnl_vadd_1上有一個Vivado HLS的圖標,說明該IP正是由Vivado HLS生成的,然後作爲C++ Kernel進行使用。
rtl_kernel_wizard_0_1即是RTL Kernel。
2.3.3 測試驗證
將固件複製到SD卡,然後運行命令進行測試,如下:
root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin
Using FPGA binary file specfied through the command line: /mnt/mi[ 50.938732] [drm] Pid 2526 opened device
xing_container.xclbin
[ 50.947471] [drm] Pid 2526 closed device
[ 50.953532] [drm] Pid 2526 opened device
Found Platform
Platform Name: Xilinx
Loading: '/mnt/mixing_container.xclbin'
[ 51.916381] [drm] Finding IP_LAYOUT section header
[ 51.916388] [drm] Section IP_LAYOUT details:
[ 51.921201] [drm] offset = 0x126ad88
[ 51.925466] [drm] size = 0xa8
[ 51.929211] [drm] Finding DEBUG_IP_LAYOUT section header
[ 51.932348] [drm] AXLF section DEBUG_IP_LAYOUT header not found
[ 51.937654] [drm] Finding CONNECTIVITY section header
[ 51.943572] [drm] Section CONNECTIVITY details:
[ 51.948616] [drm] offset = 0x126ae30
[ 51.953136] [drm] size = 0x34
[ 51.956882] [drm] Finding MEM_TOPOLOGY section header
[ 51.960019] [drm] Section MEM_TOPOLOGY details:
[ 51.965064] [drm] offset = 0x126ad58
[ 51.969585] [drm] size = 0x30
[ 51.974631] [drm] No ERT scheduler on MPSoC, using KDS
[ 51.983293] [drm] scheduler config ert(0)
[ 51.983296] [drm] cus(2)
[ 51.987305] [drm] slots(16)
[ 51.990008] [drm] num_cu_masks(1)
[ 51.992970] [drm] cu_shift(16)
[ 51.996449] [drm] cu_base(0x80000000)
[ 51.999671] [drm] polling(0)
[ 52.011442] [drm] User buffer is not physical contiguous
[ 52.019813] [drm] zocl_free_userptr_bo: obj 0x000000009a50640f
[ 52.020624] [drm] User buffer is not physical contiguous
[ 52.031792] [drm] zocl_free_userptr_bo: obj 0x000000009f443a13
[ 52.032500] [drm] User buffer is not physical contiguous
TEST WITH TWO KERNELS PASSED
[ 52.043672] [drm] zocl_free_userptr_bo: obj 0x00000000bd649846
[ 52.054960] [drm] Pid 2526 closed device
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin
Using FPGA binary file specfied through the command line: /mnt/mi[ 183.121011] [drm] Pid 2864 opened device
[ 183.129869] [drm] Pid 2864 closed device
xing_container.xclbin
[ 183.133953] [drm] Pid 2864 opened device
Found Platform
Platform Name: Xilinx
Loading: '/mnt/mixing_container.xclbin'
[ 183.234246] [drm] The XCLBIN already loaded. Don't need to reload.
[ 183.236038] [drm] Reconfiguration not supported
[ 183.254888] [drm] User buffer is not physical contiguous
[ 183.260203] [drm] zocl_free_userptr_bo: obj 0x00000000623f0590
[ 183.260948] [drm] User buffer is not physical contiguous
[ 183.272098] [drm] zocl_free_userptr_bo: obj 0x00000000b4bceb33
[ 183.272350] [drm] User buffer is not physical contiguous
TEST WITH TWO KERNELS PASSED
[ 183.283494] [drm] zocl_free_userptr_bo: obj 0x000000008012c72f
[ 183.294723] [drm] Pid 2864 closed device
root@zcu106vcu_base:~#
3 總結
使用Vitis和自定義的ZCU106 XRT平臺完成了Vitis-Tutorials中的Mixing C++ and RTL Kernels功能測試。