《Mali OpenCL SDK v1.1.0》教程樣例之四“圖像對象”

介紹

　　紋理(圖像)是現代圖像應用的很大一部分。正因爲如此，圖形硬件已經發展到允許高訪問性能地對紋理進行訪問和操作。爲充分使用這一硬件，OpenCL包括了一個可選的圖像數據類型。這些"圖像對象"在所有Mali-T600系列GPU上受到支持。圖像代表大型數據網格，可以並行地被處理。正應爲如此，圖像數據和圖像操作通常非常適合在OpenCL中做加速。圖像數據有兩種方式可以被OpenCL存儲和操作：緩衝區對象和圖像對象。

內存緩衝區

　　內存緩衝區只是數據的普通數組。因爲它們適合所有類型的數據(例如，圖像，網格，線性陣列等)，各種圖像操作是困難的。

　　　　> 爲了在一個給定的座標訪問數據，你必須計算正確的數據偏移；

　　　　> 你必須使用確切的座標來訪問你的數據，或者爲歸一化(或者其它)座標實現你自己的訪問方式；

　　　　> 你也必須處理座標在圖像區域之外的情況；

　　　　> 任何算法或優化通常根據所使用的圖像格式固定，例如RGB888(如果你需要修改圖像格式，算法/優化必須修改)；

　　　　> 圖像濾波（如雙線性過濾）必須手動完成。

圖像對象

　　圖像對象是一種特殊的內存類型，它使得對圖像數據的工作更加容易。圖像對象：

　　　　> 支持直接通過座標訪問；

　　　　> 支持歸一化的座標；

　　　　> 處理超出範圍的座標(你可以從不同的處理方案選擇)；

　　　　> 提供一個抽象圖像格式(訪問RGB888圖像與訪問RGB565圖像是一樣的)；

　　　　> 支持雙線性過濾(通過硬件加速)。

建議

　　是否使用圖像對象取決於應用。你必須考慮下列因素：

　　　　> 爲圖像數據使用圖像對象，簡化了訪問與操作數據的需要的代碼；

　　　　> 當使用圖像對象時，在一個時鐘週期裏只能有一個像素被處理。當使用緩衝區時，如果你的圖像格式是每通道少於32位的，你可以在每個時鐘週期裏處理多個像素。

　　　　例如，如果你的圖像格式是RGB8888（每個像素是32位），使用緩衝區，你可以向量化你的算法，一次操作4個像素（32-bit * 4 = 128-bit，Mali-T600系列GPU推薦的圖像寬度），但是對於圖像對象，速度固定在一個時鐘週期一個像素點。

　　　　如果格式是每通道32位或更多，那麼緩衝區的優勢就沒有了，因爲兩種方式都是一個時鐘週期一個像素。例如，如果格式是RGBA32（每個像素128位），每個時鐘週期只有一個像素可以被處理，因爲一個像素填滿了推薦的向量寬度。

　　　　>在更復雜的情況，最大的性能來自於整個系統的負載均衡。在Mali-T600系列GPU上，圖像對象使用紋理流水線，這是獨立於加載/存儲和算術流水線的。因此，同時使用圖像對象和緩衝區可能是有益的，以最大限度地利用該系統。

　　　　例如，使用圖像對象加載輸入圖像，然後在內存緩衝區加載數據來改變圖像（例如，卷積濾波器）。

圖像縮放

　　如何使用圖像對象調整一幅圖像的大小。

雙線性濾波

　　OpenCL圖像對象的特定好處之一是其內建的雙線性濾波函數。當你從OpenCL圖像對象讀取時，可以獲取四個最接近像素的平均值，而不是選擇一個距離給定座標最近的像素。這是一個硬件加速，在Mali-T600系列GPU上紋理流水線中。這意味着縮放的圖像可以有更高的性能，並且功耗更低。我們將使用這個例子來提供一個關於如何使用OpenCL圖像對象的演練。

圖1：一個最近像素(左)和雙線性濾波(右)的例子

圖像對象和內存緩衝區的差異

　　OpenCL的圖像對象用法與OpenCL緩衝區幾乎相同：

　　　　> 它們都有類型cl_mem；

　　　　> 對於分配，clCreateBuffer成爲clCreateImage2D（或clCreateImage3D）；

　　　　> 對於映射，clEnqueueMapBuffer成爲clEnqueueMapImage；

　　　　> 對於取消映射，clEnqueueUnmapBuffer對兩種內存類型都工作。

　　當使用圖像對象時，最大的不同在於：

　　　　> 圖像對象需要一個"採樣器"，以便從採樣器讀取；

　　　　> 內核不能對同一圖像都可讀可寫(在內核定義時，圖像參數必須標記爲__read_only或__write_only)；

　　　　> 圖像有一個已定義的數據格式。

採樣器

　　正如前面所講，爲了能夠從一個圖像對象中讀取數據，你必須有一個採樣器。採樣器定義了：

　　　　> 你是用的座標是否是歸一化的

　　　　　　>> 歸一化的（在範圍[0,1]中）；

　　　　　　>> 非歸一化的。

　　　　>座標超出圖像範圍時使用的策略

　　　　　　>> 不使用(你確保座標在範圍之內)；

　　　　　　>> 鉗位到邊緣(返回最接近有效像素的顏色)；

　　　　　　>> 鉗位(返回由圖像格式定義的邊界顏色)；

　　　　　　>> 重複(好象有圖像的無限複製平鋪彼此相鄰的行爲)；

　　　　　　>> 鏡像重複(同"重複"相同，除了在每個邊緣處的座標翻轉)；

　　　　>過濾策略的使用

　　　　　　>> 最近

　　　　　　>> 雙線性

　　這些選項的某些組合受到限制。

　　採樣器可使用clCreateSampler()在宿主機端定義，以參數的形式傳遞到內核，或者直接在內核中定義。將採樣器作爲一個參數傳遞給內核，可靈活地選用不同的採樣選項來運行相同的內核。

使用雙線性濾波調整圖像尺寸

　　除非另作說明，否則所有代碼片段均來自"image_scaling.cpp"。

　　在樣例代碼中，我們將使用OpenCL調整一個輸入圖像的大小。圖像在雙線性濾波使能的情況下放大8倍數(代碼中可調)。

1. 爲你的圖像分配內存

　　圖像對象的分配幾乎與緩衝區相同，主要的差別是必須指定所使用圖像的數據格式。你可以使用printSupported2DImageFormats方法列出平臺上可用的圖像格式。

    /*
     * Specify the format of the image.
     * The bitmap image we are using is RGB888, which is not a supported OpenCL image format.
     * We will use RGBA8888 and add an empty alpha channel.
     */
    cl_image_format format;
    format.image_channel_data_type = CL_UNORM_INT8;
    format.image_channel_order = CL_RGBA;
    /* Allocate memory for the input image that can be accessed by the CPU and GPU. */
    bool createMemoryObjectsSuccess = true;
    memoryObjects[0] = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, width, height, 0, NULL, &errorNumber);
    createMemoryObjectsSuccess &= checkSuccess(errorNumber);
    memoryObjects[1] = clCreateImage2D(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, newWidth, newHeight, 0, NULL, &errorNumber);
    createMemoryObjectsSuccess &= checkSuccess(errorNumber);
    if (!createMemoryObjectsSuccess)
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed creating the image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

2. 映射內存到主機指針

　　再次，這一步同映射一個緩衝區非常相似。

    /*
     * Like with memory buffers, we now map the allocated memory to a host side pointer.
     * Unlike buffers, we must specify origin coordinates, width and height for the region of the image we wish to map.
     */
    size_t origin[3] = {0, 0, 0};
    size_t region[3] = {width, height, 1};
    /*
     * clEnqueueMapImage also returns the rowPitch; the width of the mapped region in bytes.
     * If the image format is not known, this is required information when accessing the image object as a normal array.
     * The number of bytes per pixel can vary with the image format being used,
     * this affects the offset into the array for a given coordinate.
     * In our case the image format is fixed as RGBA8888 so we don't need to worry about the rowPitch.
     */
    size_t rowPitch;
    unsigned char* inputImageRGBA = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[0], CL_TRUE, CL_MAP_WRITE, origin, region, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
    if (!checkSuccess(errorNumber))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

3. 初始化內存

　　使用主機端的指針用數據填充圖像。

4. 取消映射

　　取消主機端指針的映射(像緩衝區那樣使用clEnqueueUnmapBuffer)，從而使數據可以在內核中被使用。

5. 傳遞圖像到內核

　　像緩衝區那樣，作爲一個參數傳遞圖像到內核。

6. 在內核中使用圖像

　　在這一部分中，代碼片段來自"image_scaling.cl"。

　　　　a.定義採樣器

const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;

　　　　b.計算座標

    /*
     * There is one kernel instance per pixel in the destination image.
     * The global id of this kernel instance is therefore a coordinate in the destination image.
     */
    int2 coordinate = (int2)(get_global_id(0), get_global_id(1));
    /*
     * That coordinate is only valid for the destination image.
     * If we normalize the coordinates to the range [0.0, 1.0] (using the height and width of the destination image),
     * we can use them as coordinates in the sourceImage.
     */
    float2 normalizedCoordinate = convert_float2(coordinate) * (float2)(widthNormalizationFactor, heightNormalizationFactor);

　　　　c. 讀源圖像

    /*
     * Read colours from the source image.
     * The sampler determines how the coordinates are interpreted.
     * Because bilinear filtering is enabled, the value of colour will be the average of the 4 pixels closest to the coordinate.
     */
    float4 colour = read_imagef(sourceImage, sampler, normalizedCoordinate);

　　　　d. 寫目標圖像

    /*
     * Write the colour to the destination image.
     * No sampler is used here as all writes must specify an exact valid pixel coordinate.
     */
    write_imagef(destinationImage, coordinate, colour);

7. 獲取返回值

　　映射圖像對象到一個主機端指針，讀取結果。

運行樣例

　　運行後，一個名爲"output.bmp”的圖像在板子上被創建，輸出類似於：

11 Image formats supported (channel order, channel data type):
CL_RGBA, CL_UNORM_INT8
CL_RGBA, CL_UNORM_INT16
CL_RGBA, CL_SIGNED_INT8
CL_RGBA, CL_SIGNED_INT16
CL_RGBA, CL_SIGNED_INT32
CL_RGBA, CL_UNSIGNED_INT8
CL_RGBA, CL_UNSIGNED_INT16
CL_RGBA, CL_UNSIGNED_INT32
CL_RGBA, CL_HALF_FLOAT
CL_RGBA, CL_FLOAT
CL_BGRA, CL_UNORM_INT8
Profiling information:
Queued time:    0.092ms
Wait time:      0.135206ms
Run time:       31.5405ms

附錄1：內核源碼

/*
 * This confidential and proprietary software may be used only as
 * authorised by a licensing agreement from ARM Limited
 *    (C) COPYRIGHT 2013 ARM Limited
 *        ALL RIGHTS RESERVED
 * The entire notice above must be reproduced on all authorised
 * copies and copies may only be made to the extent permitted
 * by a licensing agreement from ARM Limited.
 */

/* [Define a sampler] */
const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;
/* [Define a sampler] */

/**
 * \brief Image scaling kernel function.
 * \param[in] sourceImage Input image object.
 * \param[out] destinationImage Re-sized output image object.
 * \param[in] widthNormalizationFactor 1 / destinationImage width.
 * \param[in] heightNormalizationFactor 1 / destinationImage height.
 */
__kernel void image_scaling(__read_only image2d_t sourceImage,
                            __write_only image2d_t destinationImage,
                            const float widthNormalizationFactor,
                            const float heightNormalizationFactor)
{
    /*
     * It is possible to get the width and height of an image object (using get_image_width and get_image_height).
     * You could use this to calculate the normalization factors in the kernel.
     * In this case, because the width and height doesn't change for each kernel,
     * it is better to pass normalization factors to the kernel as parameters.
     * This way we do the calculations once on the host side instead of in every kernel.
     */

    /* [Calculate the coordinates] */
    /*
     * There is one kernel instance per pixel in the destination image.
     * The global id of this kernel instance is therefore a coordinate in the destination image.
     */
    int2 coordinate = (int2)(get_global_id(0), get_global_id(1));

    /*
     * That coordinate is only valid for the destination image.
     * If we normalize the coordinates to the range [0.0, 1.0] (using the height and width of the destination image),
     * we can use them as coordinates in the sourceImage.
     */
    float2 normalizedCoordinate = convert_float2(coordinate) * (float2)(widthNormalizationFactor, heightNormalizationFactor);
    /* [Calculate the coordinates] */

    /* [Read from the source image] */
    /*
     * Read colours from the source image.
     * The sampler determines how the coordinates are interpreted.
     * Because bilinear filtering is enabled, the value of colour will be the average of the 4 pixels closest to the coordinate.
     */
    float4 colour = read_imagef(sourceImage, sampler, normalizedCoordinate);
    /* [Read from the source image] */

    /* [Write to the destination image] */
    /*
     * Write the colour to the destination image.
     * No sampler is used here as all writes must specify an exact valid pixel coordinate.
     */
    write_imagef(destinationImage, coordinate, colour);
    /* [Write to the destination image] */
}

附錄2：主機端源碼

/*
 * This confidential and proprietary software may be used only as
 * authorised by a licensing agreement from ARM Limited
 *    (C) COPYRIGHT 2013 ARM Limited
 *        ALL RIGHTS RESERVED
 * The entire notice above must be reproduced on all authorised
 * copies and copies may only be made to the extent permitted
 * by a licensing agreement from ARM Limited.
 */

#include "common.h"
#include "image.h"

#include <CL/cl.h>
#include <iostream>

using namespace std;

/**
 * \brief OpenCL image object sample code.
 * \details Demonstration of how to use OpenCL image objects to resize an image.
 * \return The exit code of the application, non-zero if a problem occurred.
 */
int main(void)
{
    cl_context context = 0;
    cl_command_queue commandQueue = 0;
    cl_program program = 0;
    cl_device_id device = 0;
    cl_kernel kernel = 0;
    const int numMemoryObjects = 2;
    cl_mem memoryObjects[numMemoryObjects] = {0, 0};
    cl_int errorNumber;

    /* Set up OpenCL environment: create context, command queue, program and kernel. */
    if (!createContext(&context))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed to create an OpenCL context. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    if (!createCommandQueue(context, &commandQueue, &device))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed to create the OpenCL command queue. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    if (!createProgram(context, device, "assets/image_scaling.cl", &program))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed to create OpenCL program." << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    kernel = clCreateKernel(program, "image_scaling", &errorNumber);
    if (!checkSuccess(errorNumber))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed to create OpenCL kernel. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    /* Print the image formats that the OpenCL device supports. */
    cout << endl;
    printSupported2DImageFormats(context);
    cout << endl;

    /* The scaling factor to use when resizing the image. */
    const int scaleFactor = 8;

    /* Load the input image data. */
    unsigned char* inputImage = NULL;
    int width, height;
    loadFromBitmap("assets/input.bmp", &width, &height, &inputImage);

    /*
     * Calculate the width and height of the new image.
     * Used to allocate the correct amount of output memory and the number of kernels to use.
     */
    int newWidth = width * scaleFactor;
    int newHeight = height * scaleFactor;

    /* [Allocate image objects] */
    /*
     * Specify the format of the image.
     * The bitmap image we are using is RGB888, which is not a supported OpenCL image format.
     * We will use RGBA8888 and add an empty alpha channel.
     */
    cl_image_format format;
    format.image_channel_data_type = CL_UNORM_INT8;
    format.image_channel_order = CL_RGBA;

    /* Allocate memory for the input image that can be accessed by the CPU and GPU. */
    bool createMemoryObjectsSuccess = true;

    memoryObjects[0] = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, width, height, 0, NULL, &errorNumber);
    createMemoryObjectsSuccess &= checkSuccess(errorNumber);

    memoryObjects[1] = clCreateImage2D(context, CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR, &format, newWidth, newHeight, 0, NULL, &errorNumber);
    createMemoryObjectsSuccess &= checkSuccess(errorNumber);

    if (!createMemoryObjectsSuccess)
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed creating the image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }
    /* [Allocate image objects] */

    /* [Map image objects to host pointers] */
    /*
     * Like with memory buffers, we now map the allocated memory to a host side pointer.
     * Unlike buffers, we must specify origin coordinates, width and height for the region of the image we wish to map.
     */
    size_t origin[3] = {0, 0, 0};
    size_t region[3] = {width, height, 1};

    /*
     * clEnqueueMapImage also returns the rowPitch; the width of the mapped region in bytes.
     * If the image format is not known, this is required information when accessing the image object as a normal array.
     * The number of bytes per pixel can vary with the image format being used,
     * this affects the offset into the array for a given coordinate.
     * In our case the image format is fixed as RGBA8888 so we don't need to worry about the rowPitch.
     */
    size_t rowPitch;

    unsigned char* inputImageRGBA = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[0], CL_TRUE, CL_MAP_WRITE, origin, region, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
    if (!checkSuccess(errorNumber))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }
    /* [Map image objects to host pointers] */

    /* Convert the input data from RGB to RGBA (moves it to the OpenCL allocated memory at the same time). */
    RGBToRGBA(inputImage, inputImageRGBA, width, height);
    delete[] inputImage;

    /* Unmap the image from the host. */
    if (!checkSuccess(clEnqueueUnmapMemObject(commandQueue, memoryObjects[0], inputImageRGBA, 0, NULL, NULL)))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed unmapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    /*
     * Calculate the normalization factor for the image coordinates.
     * By using normalized coordinates we don't have to manually map the destination coordinates to the source coordinates.
     */
    cl_float widthNormalizationFactor = 1.0f / newWidth;
    cl_float heightNormalizationFactor = 1.0f / newHeight;

    /* Setup the kernel arguments. */
    bool setKernelArgumentsSuccess = true;
    setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 0, sizeof(cl_mem), &memoryObjects[0]));
    setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 1, sizeof(cl_mem), &memoryObjects[1]));
    setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 2, sizeof(cl_float), &widthNormalizationFactor));
    setKernelArgumentsSuccess &= checkSuccess(clSetKernelArg(kernel, 3, sizeof(cl_float), &heightNormalizationFactor));
    if (!setKernelArgumentsSuccess)
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, 3);
        cerr << "Failed setting OpenCL kernel arguments. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    /*
     * Set the kernel work size. Each kernel operates on one pixel of the ouput image.
     * Therefore, we need newWidth * newHeight kernel instances.
     * We are using two work dimensions because it maps nicely onto the coordinates of the image.
     * With one dimension we would have to derive the y coordinate from the x coordinate in the kernel.
     */
    const int workDimensions = 2;
    size_t globalWorkSize[workDimensions] = {newWidth, newHeight};

    /* An event to associate with the kernel. Allows us to retrieve profiling information later. */
    cl_event event = 0;

    /* Enqueue the kernel. */
    if (!checkSuccess(clEnqueueNDRangeKernel(commandQueue, kernel, workDimensions, NULL, globalWorkSize, NULL, 0, NULL, &event)))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed enqueuing the kernel. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    /* Wait for kernel execution completion. */
    if (!checkSuccess(clFinish(commandQueue)))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed waiting for kernel execution to finish. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    /* Print the profiling information for the event. */
    printProfilingInfo(event);
    /* Release the event object. */
    if (!checkSuccess(clReleaseEvent(event)))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed releasing the event object. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    size_t newRegion[3] = {newWidth, newHeight, 1};

    unsigned char* outputImage = (unsigned char*)clEnqueueMapImage(commandQueue,  memoryObjects[1], CL_TRUE, CL_MAP_READ, origin, newRegion, &rowPitch, NULL, 0, NULL, NULL, &errorNumber);
    if (!checkSuccess(errorNumber))
    {
        cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);
        cerr << "Failed mapping the input image. " << __FILE__ << ":"<< __LINE__ << endl;
        return 1;
    }

    unsigned char* outputImageRGB = new unsigned char[newWidth * newHeight * 3];
    RGBAToRGB(outputImage, outputImageRGB, newWidth, newHeight);

    saveToBitmap("output.bmp", newWidth, newHeight, outputImageRGB);

    delete[] outputImageRGB;

    cleanUpOpenCL(context, commandQueue, program, kernel, memoryObjects, numMemoryObjects);


    return 0;
}

《Mali OpenCL SDK v1.1.0》教程樣例之四“圖像對象”

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

《Mali OpenCL SDK v1.1.0》教程樣例之二“程序模板”

《GNU_Make 中文手冊》筆記之二 ing

《Mali OpenCL SDK v1.1.0》教程樣例之四“圖像對象”

《Autotools - GNU Autoconf, Automake與Libtool實踐者指南》第二章

《Autotools - GNU Autoconf, Automake與Libtool實踐者指南》第三章

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結