基於CUDA的OpenCL開發環境搭建與入門程序示例


參考資料《詳細程序註解學OpenCL一 環境配置和入門程序》《VS2010 NVIDIA OpenCL 開發環境配置》


一、 搭建開發環境


1. 下載和安裝CUDA SDK


  下載路徑:https://developer.nvidia.com/cuda-downloads 

  如果默認安裝路徑的話,是在:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5。打開這個目錄會發現裏面有include和lib文件夾,這就是我們需要在Visual C++ 2008配置的目錄。


2. 配置Visual C++ 2008

  

  a. 打開Visual C++ 2008,新建一個空項目;

  b. 右鍵點擊界面左側“源文件”文件夾,選擇“添加”-->"新建項",建立一個空的“main.cpp”文件;(做這一步是爲了讓工程的“屬性頁”裏的“配置屬性”裏出現“C/C++”選項,以配置路徑。)

  c.右鍵點擊項目文件,選擇“屬性”;

  d. 配置屬性頁。

    (a). “配置屬性”--> "C/C++" --> "常規" ,在右邊“附加包含目錄”裏添加: “ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include “,如下圖。




    (b).“配置屬性”--> "鏈接器" --> "常規",在右邊"附加庫目錄"裏添加:" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\Win32",如果是64位系統可以是: "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64" ,如下圖。




    (c). “配置屬性”--> "鏈接器" --> "輸入" ,在右邊"附加依賴項"裏添加:OpenCL.lib,如下圖。





二、 入門程序示例


1. OpenCL基礎概念與框架


  OpenCL支持大量不同的應用,無論哪一種,面向異構平臺的應用必須完成的步驟有:

    a.發現構成異構系統的組建;

    b.探查這些組件的特徵,使軟件能夠適應不同硬件單元的特定特性;

    c.創建將在平臺上運行的指令塊(內核);

    d.建立和管理計算中涉及的內存對象;

    e.在系統正確的組件上按正確的順序執行內核;

    f.收集最終的內核。

  這些步驟通過OpenCL中的一系列API一個面向內核的編程環境來完成。我們把問題分解爲以下模型:

  > 平臺模型(platform model):異構系統的描述;

  > 執行模型(execution model):指令流在異構平臺上執行的抽象表示;

  > 內存模型(memory model):OpenCL中的內存區域集合以及一個OpenCL計算期間這些內存如何交互;

  > 編程模型(programming model):程序員設計算法來實現一個應用時的高層描述。


  OpenCL的框架可以劃分爲以下組成部分:

  >主機編程
    平臺API

      - 查詢計算設備
      - 創建上下文(Contexts)
    運行時API
      - 創建上下文相關的內存對象
      - 編譯和創建內核編程對象
      - 發出命令到命令隊列
      - 命令同步
      - 清除OpenCL資源
  > 內核
    編程語言

      - 帶一些限制和擴展的C代碼


  應用OpenCL框架中的基本工作流示意圖如下。




2. 第一個OpenCL程序


  我們的示例程序將完成以下操作:

    a.在第一個可用平臺上創建OpenCL上下文;

    b. 在第一個可用設備上創建命令隊列;

    c.加載一個內核文件(HelloWorld.cl),並將它構建到程序對象中;

    d. 爲內核函數hello_kernel()創建一個內核對象;

    e. 爲內核參數創建內存對象;

    f. 將待執行的內核排隊;

    g. 將內核結果讀回結果緩衝區。


2.1 選擇平臺並創建上下文


  一個系統上可以有多個OpenCL實現,創建OpenCL程序的第一步是選擇一個平臺。創建平臺之後,在平臺上創建一個上下文,具體代碼如下:

	/******** 第一部分 選擇OpenCL平臺,創建一個上下文 ********/
	cl_uint numPlatforms;
	cl_platform_id *platformIds;
	cl_context context = 0;

	// 1. Select an OpenCL platform to run on.
	errNum = clGetPlatformIDs(0, NULL, &numPlatforms);						// 1. 獲取OpenCL平臺數目
	if (errNum != CL_SUCCESS || numPlatforms <= 0) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}
	printf("Platform Numbers: %d\n", numPlatforms);

	platformIds = (cl_platform_id *)malloc(
		sizeof(cl_platform_id) * numPlatforms);

	errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);				// 2. 創建所有OpenCL平臺  
	if (errNum != CL_SUCCESS) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}

	// 2. Create an OpenCL context on the platform.
	cl_context_properties contextProperties[] = {
		CL_CONTEXT_PLATFORM,
		(cl_context_properties)platformIds[0],								// 3. 選擇第一個OpenCL平臺  
		0
	};
	context = clCreateContextFromType(contextProperties, 					// 4. 嘗試爲一個GPU設備創建一個上下文 
									  CL_DEVICE_TYPE_GPU, 
									  NULL, NULL, &errNum);
	if (errNum != CL_SUCCESS) {
		perror("Could not create GPU context, trying CPU...");

		context = clCreateContextFromType(contextProperties, 				// 5. 嘗試爲一個CPU設備創建一個上下文  
			CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
		if (errNum != CL_SUCCESS) {
			perror("Failed to create an OpenCL GPU or CPU context.");
			exit(1);
		}
	}

2.2 選擇設備並創建命令隊列


  選擇平臺並創建一個上下文之後,下一步是選擇一個設備,並創建一個命令隊列,具體代碼如下:

	/******** 第二部分 選擇設備,創建命令隊列 ********/
	cl_device_id *devices;
	cl_device_id device = 0;
	cl_command_queue commandQueue = NULL;
	size_t deviceBufferSize = -1;

	// 3. Get the size of the device buffer.
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL,			// 1. 查詢存儲上下文所有可用設備ID所需要的緩衝區大小
							   &deviceBufferSize);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get context infomation.");
		exit(1);
	}
	if (deviceBufferSize <= 0) {
		perror("No devices available.");
		exit(1);
	}

	devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 					// 2. 獲取上下文中所有可用設備
							  deviceBufferSize, devices, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get device ID.");
		exit(1);
	}

	// 4. Choose the first device
	commandQueue = clCreateCommandQueue(context,							// 3. 選擇第一個設備,創建一個命令隊列
										devices[0], 0, NULL);
	if (commandQueue == NULL) {
		perror("Failed to create commandQueue for device 0.");
		exit(1);
	}

	device = devices[0];
	delete [] devices;

2.3 創建和構建程序對象


  OpenCL的下一步是從HelloWorld.cl文件中加載OpenCL內核源代碼,由它創建一個程序對象。該程序對象用內核源代碼加載,然後進行編譯,從而在上下文相關聯的設備上執行。HelloWorld.cl中的OpenCL內核代碼如下:

 // OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a, 
				  	__global const float* b, 
				  	__global float* result)
{
    // get index into global data array
    int iGID = get_global_id(0);
    
    // elements operation
    result[iGID] = a[iGID] * b[iGID];
}
創建和構建程序對象的源碼如下:

	/******** 第三部分 讀取OpenCL C語言,創建和構建程序對象 ********/
	cl_program program;
	size_t szKernelLength;			// Byte size of kernel code
	char* cSourceCL = NULL;         // Buffer to hold source for compilation

	// 5. Read the OpenCL kernel in from source file
	cSourceCL = oclLoadProgSource(										// 1. 從絕對路徑讀取HelloWorld.cl的源代碼
		"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "", 
		&szKernelLength);
	if (cSourceCL == NULL){
        perror("Error in oclLoadProgSource\n");
        exit(1);
    }

	// 6. Create the program
    program = clCreateProgramWithSource(context, 1, 					// 2. 使用源代碼創建程序對象
										(const char **)&cSourceCL, 
										&szKernelLength, &errNum);
    if (errNum != CL_SUCCESS) {
        perror("Error in clCreateProgramWithSource\n");
        exit(1);
    }

	// 7. Build the program with 'mad' Optimization option
    char* flags = "-cl-fast-relaxed-math";
    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);		// 3. 編譯內核源代碼
    if (errNum != CL_SUCCESS) {
        perror("Error in clBuildProgram.\n");
        exit(1);
    }

讀取內核源代碼的函數來自Nvidia官方網站的實例程序,具體代碼如下:

//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename        program filename
//! @param cPreamble        code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength    returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
    // locals 
    FILE* pFileStream = NULL;
    size_t szSourceLength;

    // open the OpenCL source code file
    #ifdef _WIN32   // Windows version
        if(fopen_s(&pFileStream, cFilename, "rb") != 0) 
        {       
            return NULL;
        }
    #else           // Linux version
        pFileStream = fopen(cFilename, "rb");
        if(pFileStream == 0) 
        {       
            return NULL;
        }
    #endif

    size_t szPreambleLength = strlen(cPreamble);

    // get the length of the source code
    fseek(pFileStream, 0, SEEK_END); 
    szSourceLength = ftell(pFileStream);
    fseek(pFileStream, 0, SEEK_SET); 

    // allocate a buffer for the source code string and read it in
    char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1); 
    memcpy(cSourceString, cPreamble, szPreambleLength);
    if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
    {
        fclose(pFileStream);
        free(cSourceString);
        return 0;
    }

    // close the file and return the total length of the combined (preamble + source) string
    fclose(pFileStream);
    if(szFinalLength != 0)
    {
        *szFinalLength = szSourceLength + szPreambleLength;
    }
    cSourceString[szSourceLength + szPreambleLength] = '\0';

    return cSourceString;
}


2.4 創建內核和內存對象


  下面是創建內核對象,並將其編譯到程序對象中。另外,分配數組,將數組數據複製到爲內存對象分配的存儲空間中。具體代碼如下:

	/******** 第四部分 創建內核和內存對象 ********/
	#define ARRAY_SIZE 10

	cl_kernel kernel = 0;
	cl_mem memObjects[3] = {0, 0, 0};

	float a[ARRAY_SIZE];
	float b[ARRAY_SIZE];
	float result[ARRAY_SIZE];

	// 8. Create the kernel
    kernel = clCreateKernel(program, "HelloOpenCL", NULL);			// 1. 創建內核對象
	if (kernel == NULL) {
		perror("Error in clCreateKernel.\n");
        exit(1);
	}

	// 9. Create memory objects
	for (int i = 0; i < ARRAY_SIZE; i++) {
		a[i] = (float)i;
		b[i] = (float)i;
	}

	memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |		// 2. 創建內存對象
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   a, NULL);
	memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   b, NULL);
	memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   result, NULL);
	if (memObjects[0] == NULL || memObjects[1] == NULL || 
			memObjects[2] == NULL) {
		perror("Error in clCreateBuffer.\n");
        exit(1);
	}

2.5 執行內核


  創建了內核和內存對象之後,可以將要執行的內核排隊。首先是建立內核參數,之後利用命令隊列使將在設備上執行的內核排隊。執行內核排隊並不表示這個內核會被立即執行。內核執行會被放在命令隊列中,以後再由設備消費。具體代碼如下:

	/******** 第五部分 執行內核 ********/
	size_t globalWorkSize[1] = { ARRAY_SIZE };
	size_t localWorkSize[1] = { 1 };

	// 10. Set the kernel arguments
	errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]);		// 1. 設置內核參數
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
    errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
	if (errNum != CL_SUCCESS) {
		perror("Error in clSetKernelArg.\n");
        exit(1);
	}

	// 11. Queue the kernel up for execution across the array
	errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,			// 2. 執行內核排隊
									globalWorkSize, localWorkSize,
									0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueNDRangeKernel.\n");
        exit(1);
	}

	// 12. Read the output buffer back to the Host
	errNum = clEnqueueReadBuffer(commandQueue, memObjects[2],				// 3. 讀取運算結果到主機
								 CL_TRUE, 0,
								 ARRAY_SIZE * sizeof(float), result,
								 0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueReadBuffer.\n");
        exit(1);
	}


2.6 結果測試

  本示例程序的目的是,計算兩個數組中對應元素相乘的結果,其測試用例與結果如下:
	/******** 第六部分 測試結果 ********/
	printf("\nTest: a * b = c\n\n");

	printf("Input numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);

	printf("\nOutput numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] * b[%d] = %f\n", i, i, result[i]);

  最終測試輸出界面如下圖所示,輸出的前面四行是打印平臺信息,在上訴代碼中並未涉及,具體查看附錄裏的完整代碼。



三、 附錄 完整的示例程序


1. 內核代碼(HelloWorld.cl)


 // OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a, 
						  __global const float* b, 
						  __global float* result)
{
    // get index into global data array
    int iGID = get_global_id(0);
    
    // elements operation
    result[iGID] = a[iGID] * b[iGID];
}

2. 完整的主機程序(main.cpp)


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>

#ifdef MAC
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif

char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength);

int main() 
{
	cl_int errNum;


	/******** 第一部分 選擇OpenCL平臺,創建一個上下文 ********/
	cl_uint numPlatforms;
	cl_platform_id *platformIds;
	cl_context context = 0;

	// 1. Select an OpenCL platform to run on.
	errNum = clGetPlatformIDs(0, NULL, &numPlatforms);						// 1. 獲取OpenCL平臺數目
	if (errNum != CL_SUCCESS || numPlatforms <= 0) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}
	printf("Platform Numbers: %d\n", numPlatforms);

	platformIds = (cl_platform_id *)malloc(
		sizeof(cl_platform_id) * numPlatforms);

	errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);				// 2. 創建所有OpenCL平臺  
	if (errNum != CL_SUCCESS) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}

	//------------------ 打印平臺信息(Start) ------------------/
	// Extension data				
	size_t ext_size = 0;   

	//輸出生產商的名字
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_NAME, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_NAME.");			
		exit(1);
	}	
	char *name = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_NAME, 	
		ext_size, name, NULL);				
	printf("Platform Name: %s\n", name);

	//供應商信息
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_VENDOR, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_VENDOR.");			
		exit(1);
	}	
	char *vendor = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_VENDOR, 	
		ext_size, vendor, NULL);				
	printf("Platform Vendor: %s\n", vendor);

	//最高支持的OpenCL版本
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_VERSION, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_VERSION.");			
		exit(1);
	}	
	char *version = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_VERSION, 	
		ext_size, version, NULL);				
	printf("Platform Version: %s\n", version);

	//只有兩個值:full profile 和 embeded profile
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_PROFILE, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_PROFILE.");			
		exit(1);
	}	
	char *profile = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_PROFILE, 	
		ext_size, profile, NULL);				
	printf("Platform Full Profile or Embeded Profile?: %s\n", profile);
	//------------------ 打印平臺信息(End) ------------------/

	// 2. Create an OpenCL context on the platform.
	cl_context_properties contextProperties[] = {
		CL_CONTEXT_PLATFORM,
		(cl_context_properties)platformIds[0],								// 3. 選擇第一個OpenCL平臺  
		0
	};
	context = clCreateContextFromType(contextProperties, 					// 4. 嘗試爲一個GPU設備創建一個上下文 
									  CL_DEVICE_TYPE_GPU, 
									  NULL, NULL, &errNum);
	if (errNum != CL_SUCCESS) {
		perror("Could not create GPU context, trying CPU...");

		context = clCreateContextFromType(contextProperties, 				// 5. 嘗試爲一個CPU設備創建一個上下文  
			CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
		if (errNum != CL_SUCCESS) {
			perror("Failed to create an OpenCL GPU or CPU context.");
			exit(1);
		}
	}


	/******** 第二部分 選擇設備,創建命令隊列 ********/
	cl_device_id *devices;
	cl_device_id device = 0;
	cl_command_queue commandQueue = NULL;
	size_t deviceBufferSize = -1;

	// 3. Get the size of the device buffer.
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL,			// 1. 查詢存儲上下文所有可用設備ID所需要的緩衝區大小
							   &deviceBufferSize);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get context infomation.");
		exit(1);
	}
	if (deviceBufferSize <= 0) {
		perror("No devices available.");
		exit(1);
	}

	devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 					// 2. 獲取上下文中所有可用設備
							  deviceBufferSize, devices, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get device ID.");
		exit(1);
	}

	// 4. Choose the first device
	commandQueue = clCreateCommandQueue(context,							// 3. 選擇第一個設備,創建一個命令隊列
										devices[0], 0, NULL);
	if (commandQueue == NULL) {
		perror("Failed to create commandQueue for device 0.");
		exit(1);
	}

	device = devices[0];
	delete [] devices;


	/******** 第三部分 讀取OpenCL C語言,創建和構建程序對象 ********/
	cl_program program;
	size_t szKernelLength;			// Byte size of kernel code
	char* cSourceCL = NULL;         // Buffer to hold source for compilation

	// 5. Read the OpenCL kernel in from source file
	cSourceCL = oclLoadProgSource(										// 1. 從絕對路徑讀取HelloWorld.cl的源代碼
		"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "", 
		&szKernelLength);
	if (cSourceCL == NULL){
        perror("Error in oclLoadProgSource\n");
        exit(1);
    }

	// 6. Create the program
    program = clCreateProgramWithSource(context, 1, 					// 2. 使用源代碼創建程序對象
										(const char **)&cSourceCL, 
										&szKernelLength, &errNum);
    if (errNum != CL_SUCCESS) {
        perror("Error in clCreateProgramWithSource\n");
        exit(1);
    }

	// 7. Build the program with 'mad' Optimization option
    char* flags = "-cl-fast-relaxed-math";
    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);		// 3. 編譯內核源代碼
    if (errNum != CL_SUCCESS) {
        perror("Error in clBuildProgram.\n");
        exit(1);
    }


	/******** 第四部分 創建內核和內存對象 ********/
	#define ARRAY_SIZE 10

	cl_kernel kernel = 0;
	cl_mem memObjects[3] = {0, 0, 0};

	float a[ARRAY_SIZE];
	float b[ARRAY_SIZE];
	float result[ARRAY_SIZE];

	// 8. Create the kernel
    kernel = clCreateKernel(program, "HelloOpenCL", NULL);			// 1. 創建內核對象
	if (kernel == NULL) {
		perror("Error in clCreateKernel.\n");
        exit(1);
	}

	// 9. Create memory objects
	for (int i = 0; i < ARRAY_SIZE; i++) {
		a[i] = (float)i;
		b[i] = (float)i;
	}

	memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |		// 2. 創建內存對象
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   a, NULL);
	memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   b, NULL);
	memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   result, NULL);
	if (memObjects[0] == NULL || memObjects[1] == NULL || 
			memObjects[2] == NULL) {
		perror("Error in clCreateBuffer.\n");
        exit(1);
	}


	/******** 第五部分 執行內核 ********/
	size_t globalWorkSize[1] = { ARRAY_SIZE };
	size_t localWorkSize[1] = { 1 };

	// 10. Set the kernel arguments
	errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]);		// 1. 設置內核參數
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
    errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
	if (errNum != CL_SUCCESS) {
		perror("Error in clSetKernelArg.\n");
        exit(1);
	}

	// 11. Queue the kernel up for execution across the array
	errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,			// 2. 執行內核排隊
									globalWorkSize, localWorkSize,
									0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueNDRangeKernel.\n");
        exit(1);
	}

	// 12. Read the output buffer back to the Host
	errNum = clEnqueueReadBuffer(commandQueue, memObjects[2],				// 3. 讀取運算結果到主機
								 CL_TRUE, 0,
								 ARRAY_SIZE * sizeof(float), result,
								 0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueReadBuffer.\n");
        exit(1);
	}


	/******** 第六部分 測試結果 ********/
	printf("\nTest: a * b = c\n\n");

	printf("Input numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);

	printf("\nOutput numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] * b[%d] = %f\n", i, i, result[i]);


	while(1);

	return 0;
}


//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename        program filename
//! @param cPreamble        code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength    returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
    // locals 
    FILE* pFileStream = NULL;
    size_t szSourceLength;

    // open the OpenCL source code file
    #ifdef _WIN32   // Windows version
        if(fopen_s(&pFileStream, cFilename, "rb") != 0) 
        {       
            return NULL;
        }
    #else           // Linux version
        pFileStream = fopen(cFilename, "rb");
        if(pFileStream == 0) 
        {       
            return NULL;
        }
    #endif

    size_t szPreambleLength = strlen(cPreamble);

    // get the length of the source code
    fseek(pFileStream, 0, SEEK_END); 
    szSourceLength = ftell(pFileStream);
    fseek(pFileStream, 0, SEEK_SET); 

    // allocate a buffer for the source code string and read it in
    char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1); 
    memcpy(cSourceString, cPreamble, szPreambleLength);
    if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
    {
        fclose(pFileStream);
        free(cSourceString);
        return 0;
    }

    // close the file and return the total length of the combined (preamble + source) string
    fclose(pFileStream);
    if(szFinalLength != 0)
    {
        *szFinalLength = szSourceLength + szPreambleLength;
    }
    cSourceString[szSourceLength + szPreambleLength] = '\0';

    return cSourceString;
}




發佈了33 篇原創文章 · 獲贊 8 · 訪問量 10萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章