【XRT Vitis-Tutorials】C++/RTL Kernel混合編程測試

1 前言

前面文章導航:
ZCU106 XRT環境搭建
ZCU106 XRT Vivado工程分析
ZCU106 XRT PetaLinux工程分析
【XRT Vitis-Tutorials】RTL Kernels測試

官方文檔:
2019.2 Vitis™ Application Acceleration Development Flow Tutorials
Vitis Unified Software Platform Documentation Application Acceleration Development
Vitis Unified Software Platform Documentation Embedded Software Development

Vitis ZCU106 Platform
ZCU106 Vitis Platform

pre-built,直接下載並複製到SD卡即可測試:
ZCU106 Test Image

2 創建Vitis工程

本篇文章來測試Tutorials中的第2個例子:Mixing C++ and RTL Kernels
該例子中進行了兩個步驟的實驗,分別是sw_emu和hw_emu,我這變還是會繼續在硬件上直接測試。
本例子將會進行如下測試:

  • 使用sw_emu仿真方法,測試C++ Kernel的功能
  • 使用hw_emu仿真方法,測試Mixing C++ and RTL Kernels的功能
  • 使用hardware實測方法,板上驗證硬件加速功能

2.1 工程創建

例子中的測試方法是使用腳本 run_sprite_mixing_c_rtl_kernels.sh 來進行Vitis工程創建和編譯的,我爲了方便還是使用GUI的方法吧。

2.1 新建工程

在Vitis中創建一個新的Application Project,平臺選擇zcu106vcu_base。

2.2 添加源代碼

我們將需要編譯的內容直接添加到src目錄下,包括:

  • C++ Kernel:krnl_vadd.cpp
  • RTL Kernel:rtl_kernel_wizard_0.xo
  • Host APP:host_step2.cpp(直接使用host_step2可以測試混合Kernel功能)

接着我們要將編譯平臺選擇到Hardware,將C++/RTL Kernels添加到Hardware Functions內進行加速。
最終的工程目錄結構如下圖:
在這裏插入圖片描述

2.3 代碼分析

host_step2.cpp

該例子中的主要功能如下:

  • 先使用C++ Kernel:krnl_vector_add 進行 c=a+b的硬件加速
  • 再使用RTL Kernel:krnl_const_add 進行d=c+1的硬件加速
//set the kernel Arguments
krnl_vector_add.setArg(0,buffer_a);
krnl_vector_add.setArg(1,buffer_b);
krnl_vector_add.setArg(2,buffer_result);
krnl_vector_add.setArg(3,DATA_SIZE);
krnl_const_add.setArg(0,buffer_result);
//Launch the Kernel
q.enqueueTask(krnl_vector_add);
q.enqueueTask(krnl_const_add);

C++ Kernel

C++ Kernel的源碼如下:

//------------------------------------------------------------------------------
//
// kernel:  vadd
//
// Purpose: Demonstrate Vector Add in OpenCL
//

#define BUFFER_SIZE 256
extern "C" {

void krnl_vadd(
                int* a,
                int* b,
                int* c,
                const int n_elements)
{

#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=a max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=b max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem1 port=c max_write_burst_length = 256

#pragma HLS INTERFACE s_axilite port=a  bundle=control
#pragma HLS INTERFACE s_axilite port=b  bundle=control
#pragma HLS INTERFACE s_axilite port=c  bundle=control

#pragma HLS INTERFACE s_axilite port=n_elements  bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control

	int arrayA[BUFFER_SIZE];
	int arrayB[BUFFER_SIZE];

    for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
    {
        int size = BUFFER_SIZE;
        //boundary check
        if (i + size > n_elements) size = n_elements - i;

        //Burst reading A and B
        readA: for (int j = 0 ; j < size ; j++) {
		#pragma HLS pipeline ii = 1 rewind
            arrayA[j] = a[i+j];
            arrayB[j] = b[i+j];
        }

        //Burst reading B and calculating C and Burst writing
        // to  Global memory
    vadd_wrteC: for (int j = 0 ; j < size ; j++){
		#pragma HLS pipeline ii = 1 rewind
            c[i+j] = arrayA[j] + arrayB[j];
        }
    }
}
}

可以看出,其實這就是一個Vivado HLS的代碼。功能包括:

  • 兩個AXI_Master接口用於讀寫數據:gmem,gmem1
  • 一個AXI_Slave接口用於4個寄存器的配置:包括a,b,c數據地址,n_elements的可配之參數
  • 實際的IP功能,實現c=a+b,藉助HLS pipeline定義實現了流水線處理

RTL Kernel

RTL的功能可以解壓一下rtl_kernel_wizard_0.xo,然後看其中的邏輯代碼,就是實現了一個數據+1的功能。
同時可以看到rtl_kernel_wizard_0.xo能夠被展開,其中包含了一個名爲rtl_kernel_wizard_0_cmodel.cpp的C Model測試代碼。該C Model可以被用於進行仿真驗證測試,因爲仿真時候是沒有RTL實體的。

2.2 仿真測試

其中仿真的兩個測試方法我就不進行說明了,按照Tutorials的說明來做即可。

2.2.1 Review the Application Timeline

在該例子中還使用了Vitis的新工具Vitis Analyzer進行了仿真時序的查看,我還沒仔細研究,看起來是一個不錯的工具,可以觀察數據處理的流程和Kernel的運行時間。

2.3 上板測試

2.3.1 編譯

該工程中直接選擇Hardware進行編譯即可,mixing_container與上一次的只有一個的Kernel有所區別。
如下圖:
在這裏插入圖片描述mixing_container中包含了兩個Kernels,一個是RTL Kernel,一個是C++ Kernel,這也就是本篇的核心功能。
點擊Build進行編譯

2.3.2 Vivado工程

編譯完成後,可以打開Vivado查看一下內部的結構。如下圖:
在這裏插入圖片描述krnl_vadd_1上有一個Vivado HLS的圖標,說明該IP正是由Vivado HLS生成的,然後作爲C++ Kernel進行使用。
rtl_kernel_wizard_0_1即是RTL Kernel。

2.3.3 測試驗證

將固件複製到SD卡,然後運行命令進行測試,如下:

root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin 
Using FPGA binary file specfied through the command line: /mnt/mi[   50.938732] [drm] Pid 2526 opened device
xing_container.xclbin
[   50.947471] [drm] Pid 2526 closed device
[   50.953532] [drm] Pid 2526 opened device
Found Platform
Platform Name: Xilinx
Loading: '/mnt/mixing_container.xclbin'
[   51.916381] [drm] Finding IP_LAYOUT section header
[   51.916388] [drm] Section IP_LAYOUT details:
[   51.921201] [drm]   offset = 0x126ad88
[   51.925466] [drm]   size = 0xa8
[   51.929211] [drm] Finding DEBUG_IP_LAYOUT section header
[   51.932348] [drm] AXLF section DEBUG_IP_LAYOUT header not found
[   51.937654] [drm] Finding CONNECTIVITY section header
[   51.943572] [drm] Section CONNECTIVITY details:
[   51.948616] [drm]   offset = 0x126ae30
[   51.953136] [drm]   size = 0x34
[   51.956882] [drm] Finding MEM_TOPOLOGY section header
[   51.960019] [drm] Section MEM_TOPOLOGY details:
[   51.965064] [drm]   offset = 0x126ad58
[   51.969585] [drm]   size = 0x30
[   51.974631] [drm] No ERT scheduler on MPSoC, using KDS
[   51.983293] [drm] scheduler config ert(0)
[   51.983296] [drm]   cus(2)
[   51.987305] [drm]   slots(16)
[   51.990008] [drm]   num_cu_masks(1)
[   51.992970] [drm]   cu_shift(16)
[   51.996449] [drm]   cu_base(0x80000000)
[   51.999671] [drm]   polling(0)
[   52.011442] [drm] User buffer is not physical contiguous
[   52.019813] [drm] zocl_free_userptr_bo: obj 0x000000009a50640f
[   52.020624] [drm] User buffer is not physical contiguous
[   52.031792] [drm] zocl_free_userptr_bo: obj 0x000000009f443a13
[   52.032500] [drm] User buffer is not physical contiguous
TEST WITH TWO KERNELS PASSED
[   52.043672] [drm] zocl_free_userptr_bo: obj 0x00000000bd649846
[   52.054960] [drm] Pid 2526 closed device
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin 
Using FPGA binary file specfied through the command line: /mnt/mi[  183.121011] [drm] Pid 2864 opened device
[  183.129869] [drm] Pid 2864 closed device
xing_container.xclbin
[  183.133953] [drm] Pid 2864 opened device
Found Platform
Platform Name: Xilinx
Loading: '/mnt/mixing_container.xclbin'
[  183.234246] [drm] The XCLBIN already loaded. Don't need to reload.
[  183.236038] [drm] Reconfiguration not supported
[  183.254888] [drm] User buffer is not physical contiguous
[  183.260203] [drm] zocl_free_userptr_bo: obj 0x00000000623f0590
[  183.260948] [drm] User buffer is not physical contiguous
[  183.272098] [drm] zocl_free_userptr_bo: obj 0x00000000b4bceb33
[  183.272350] [drm] User buffer is not physical contiguous
TEST WITH TWO KERNELS PASSED
[  183.283494] [drm] zocl_free_userptr_bo: obj 0x000000008012c72f
[  183.294723] [drm] Pid 2864 closed device
root@zcu106vcu_base:~# 

3 總結

使用Vitis和自定義的ZCU106 XRT平臺完成了Vitis-Tutorials中的Mixing C++ and RTL Kernels功能測試。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章