参考资料：《详细程序注解学OpenCL一环境配置和入门程序》、《VS2010 NVIDIA OpenCL 开发环境配置》

一、搭建开发环境

1. 下载和安装CUDA SDK

　　下载路径：https://developer.nvidia.com/cuda-downloads ；

　　如果默认安装路径的话，是在：C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5。打开这个目录会发现里面有include和lib文件夹，这就是我们需要在Visual C++ 2008中配置的目录。

2. 配置Visual C++ 2008

　　a. 打开Visual C++ 2008，新建一个空项目；

　　b. 右键点击界面左侧“源文件”文件夹，选择“添加”-->"新建项"，建立一个空的“main.cpp”文件；（做这一步是为了让工程的“属性页”里的“配置属性”里出现“C/C++”选项，以配置路径。）

　　c.右键点击项目文件，选择“属性”；

　　d. 配置属性页。

　　　　(a). “配置属性”--> "C/C++" --> "常规" ，在右边“附加包含目录”里添加： “ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include “，如下图。

　　　　(b).“配置属性”--> "链接器" --> "常规"，在右边"附加库目录"里添加：" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\Win32"，如果是64位系统可以是： "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64" ，如下图。

　　　　(c). “配置属性”--> "链接器" --> "输入" ，在右边"附加依赖项"里添加：OpenCL.lib，如下图。

二、入门程序示例

1. OpenCL基础概念与框架

　　OpenCL支持大量不同的应用，无论哪一种，面向异构平台的应用都必须完成的步骤有：

　　　　a.发现构成异构系统的组建；

　　　　b.探查这些组件的特征，使软件能够适应不同硬件单元的特定特性；

　　　　c.创建将在平台上运行的指令块(内核)；

　　　　d.建立和管理计算中涉及的内存对象；

　　　　e.在系统正确的组件上按正确的顺序执行内核；

　　　　f.收集最终的内核。

　　这些步骤通过OpenCL中的一系列API再加上一个面向内核的编程环境来完成。我们把问题分解为以下模型：

　　> 平台模型（platform model）：异构系统的描述；

　　> 执行模型（execution model）：指令流在异构平台上执行的抽象表示；

　　> 内存模型（memory model）：OpenCL中的内存区域集合以及一个OpenCL计算期间这些内存如何交互；

　　> 编程模型（programming model）：程序员设计算法来实现一个应用时的高层描述。

　　OpenCL的框架可以划分为以下组成部分：

　　>主机编程
　　　　平台API
　　　　　　- 查询计算设备
　　　　　　- 创建上下文(Contexts)
　　　　运行时API
　　　　　　- 创建上下文相关的内存对象
　　　　　　- 编译和创建内核编程对象
　　　　　　- 发出命令到命令队列
　　　　　　- 命令同步
　　　　　　- 清除OpenCL资源
　　> 内核
　　　　编程语言
　　　　　　- 带一些限制和扩展的C代码

　　应用在OpenCL框架中的基本工作流示意图如下。

2. 第一个OpenCL程序

　　我们的示例程序将完成以下操作：

　　　　a.在第一个可用平台上创建OpenCL上下文；

　　　　b. 在第一个可用设备上创建命令队列；

　　　　c.加载一个内核文件(HelloWorld.cl)，并将它构建到程序对象中；

　　　　d. 为内核函数hello_kernel()创建一个内核对象；

　　　　e. 为内核参数创建内存对象；

　　　　f. 将待执行的内核排队；

　　　　g. 将内核结果读回结果缓冲区。

2.1 选择平台并创建上下文

　　一个系统上可以有多个OpenCL实现，创建OpenCL程序的第一步是选择一个平台。创建平台之后，在平台上创建一个上下文，具体代码如下：

	/******** 第一部分 选择OpenCL平台，创建一个上下文 ********/
	cl_uint numPlatforms;
	cl_platform_id *platformIds;
	cl_context context = 0;

	// 1. Select an OpenCL platform to run on.
	errNum = clGetPlatformIDs(0, NULL, &numPlatforms);						// 1. 获取OpenCL平台数目
	if (errNum != CL_SUCCESS || numPlatforms <= 0) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}
	printf("Platform Numbers: %d\n", numPlatforms);

	platformIds = (cl_platform_id *)malloc(
		sizeof(cl_platform_id) * numPlatforms);

	errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);				// 2. 创建所有OpenCL平台  
	if (errNum != CL_SUCCESS) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}

	// 2. Create an OpenCL context on the platform.
	cl_context_properties contextProperties[] = {
		CL_CONTEXT_PLATFORM,
		(cl_context_properties)platformIds[0],								// 3. 选择第一个OpenCL平台  
		0
	};
	context = clCreateContextFromType(contextProperties, 					// 4. 尝试为一个GPU设备创建一个上下文 
									  CL_DEVICE_TYPE_GPU, 
									  NULL, NULL, &errNum);
	if (errNum != CL_SUCCESS) {
		perror("Could not create GPU context, trying CPU...");

		context = clCreateContextFromType(contextProperties, 				// 5. 尝试为一个CPU设备创建一个上下文  
			CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
		if (errNum != CL_SUCCESS) {
			perror("Failed to create an OpenCL GPU or CPU context.");
			exit(1);
		}
	}

2.2 选择设备并创建命令队列

　　选择平台并创建一个上下文之后，下一步是选择一个设备，并创建一个命令队列，具体代码如下：

	/******** 第二部分 选择设备，创建命令队列 ********/
	cl_device_id *devices;
	cl_device_id device = 0;
	cl_command_queue commandQueue = NULL;
	size_t deviceBufferSize = -1;

	// 3. Get the size of the device buffer.
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL,			// 1. 查询存储上下文所有可用设备ID所需要的缓冲区大小
							   &deviceBufferSize);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get context infomation.");
		exit(1);
	}
	if (deviceBufferSize <= 0) {
		perror("No devices available.");
		exit(1);
	}

	devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 					// 2. 获取上下文中所有可用设备
							  deviceBufferSize, devices, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get device ID.");
		exit(1);
	}

	// 4. Choose the first device
	commandQueue = clCreateCommandQueue(context,							// 3. 选择第一个设备，创建一个命令队列
										devices[0], 0, NULL);
	if (commandQueue == NULL) {
		perror("Failed to create commandQueue for device 0.");
		exit(1);
	}

	device = devices[0];
	delete [] devices;

2.3 创建和构建程序对象

　　OpenCL的下一步是从HelloWorld.cl文件中加载OpenCL内核源代码，由它创建一个程序对象。该程序对象用内核源代码加载，然后进行编译，从而在上下文相关联的设备上执行。HelloWorld.cl中的OpenCL内核代码如下：

 // OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a, 
				  	__global const float* b, 
				  	__global float* result)
{
    // get index into global data array
    int iGID = get_global_id(0);
    
    // elements operation
    result[iGID] = a[iGID] * b[iGID];
}

创建和构建程序对象的源码如下：

	/******** 第三部分 读取OpenCL C语言，创建和构建程序对象 ********/
	cl_program program;
	size_t szKernelLength;			// Byte size of kernel code
	char* cSourceCL = NULL;         // Buffer to hold source for compilation

	// 5. Read the OpenCL kernel in from source file
	cSourceCL = oclLoadProgSource(										// 1. 从绝对路径读取HelloWorld.cl的源代码
		"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "", 
		&szKernelLength);
	if (cSourceCL == NULL){
        perror("Error in oclLoadProgSource\n");
        exit(1);
    }

	// 6. Create the program
    program = clCreateProgramWithSource(context, 1, 					// 2. 使用源代码创建程序对象
										(const char **)&cSourceCL, 
										&szKernelLength, &errNum);
    if (errNum != CL_SUCCESS) {
        perror("Error in clCreateProgramWithSource\n");
        exit(1);
    }

	// 7. Build the program with 'mad' Optimization option
    char* flags = "-cl-fast-relaxed-math";
    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);		// 3. 编译内核源代码
    if (errNum != CL_SUCCESS) {
        perror("Error in clBuildProgram.\n");
        exit(1);
    }

读取内核源代码的函数来自Nvidia官方网站的实例程序，具体代码如下：

//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename        program filename
//! @param cPreamble        code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength    returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
    // locals 
    FILE* pFileStream = NULL;
    size_t szSourceLength;

    // open the OpenCL source code file
    #ifdef _WIN32   // Windows version
        if(fopen_s(&pFileStream, cFilename, "rb") != 0) 
        {       
            return NULL;
        }
    #else           // Linux version
        pFileStream = fopen(cFilename, "rb");
        if(pFileStream == 0) 
        {       
            return NULL;
        }
    #endif

    size_t szPreambleLength = strlen(cPreamble);

    // get the length of the source code
    fseek(pFileStream, 0, SEEK_END); 
    szSourceLength = ftell(pFileStream);
    fseek(pFileStream, 0, SEEK_SET); 

    // allocate a buffer for the source code string and read it in
    char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1); 
    memcpy(cSourceString, cPreamble, szPreambleLength);
    if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
    {
        fclose(pFileStream);
        free(cSourceString);
        return 0;
    }

    // close the file and return the total length of the combined (preamble + source) string
    fclose(pFileStream);
    if(szFinalLength != 0)
    {
        *szFinalLength = szSourceLength + szPreambleLength;
    }
    cSourceString[szSourceLength + szPreambleLength] = '\0';

    return cSourceString;
}

2.4 创建内核和内存对象

　　下面是创建内核对象，并将其编译到程序对象中。另外，分配数组，将数组数据复制到为内存对象分配的存储空间中。具体代码如下：

	/******** 第四部分 创建内核和内存对象 ********/
	#define ARRAY_SIZE 10

	cl_kernel kernel = 0;
	cl_mem memObjects[3] = {0, 0, 0};

	float a[ARRAY_SIZE];
	float b[ARRAY_SIZE];
	float result[ARRAY_SIZE];

	// 8. Create the kernel
    kernel = clCreateKernel(program, "HelloOpenCL", NULL);			// 1. 创建内核对象
	if (kernel == NULL) {
		perror("Error in clCreateKernel.\n");
        exit(1);
	}

	// 9. Create memory objects
	for (int i = 0; i < ARRAY_SIZE; i++) {
		a[i] = (float)i;
		b[i] = (float)i;
	}

	memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |		// 2. 创建内存对象
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   a, NULL);
	memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   b, NULL);
	memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   result, NULL);
	if (memObjects[0] == NULL || memObjects[1] == NULL || 
			memObjects[2] == NULL) {
		perror("Error in clCreateBuffer.\n");
        exit(1);
	}

2.5 执行内核

　　创建了内核和内存对象之后，可以将要执行的内核排队。首先是建立内核参数，之后利用命令队列使将在设备上执行的内核排队。执行内核排队并不表示这个内核会被立即执行。内核执行会被放在命令队列中，以后再由设备消费。具体代码如下：

	/******** 第五部分 执行内核 ********/
	size_t globalWorkSize[1] = { ARRAY_SIZE };
	size_t localWorkSize[1] = { 1 };

	// 10. Set the kernel arguments
	errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]);		// 1. 设置内核参数
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
    errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
	if (errNum != CL_SUCCESS) {
		perror("Error in clSetKernelArg.\n");
        exit(1);
	}

	// 11. Queue the kernel up for execution across the array
	errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,			// 2. 执行内核排队
									globalWorkSize, localWorkSize,
									0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueNDRangeKernel.\n");
        exit(1);
	}

	// 12. Read the output buffer back to the Host
	errNum = clEnqueueReadBuffer(commandQueue, memObjects[2],				// 3. 读取运算结果到主机
								 CL_TRUE, 0,
								 ARRAY_SIZE * sizeof(float), result,
								 0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueReadBuffer.\n");
        exit(1);
	}

2.6 结果测试

　　本示例程序的目的是，计算两个数组中对应元素相乘的结果，其测试用例与结果如下：

	/******** 第六部分 测试结果 ********/
	printf("\nTest: a * b = c\n\n");

	printf("Input numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);

	printf("\nOutput numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] * b[%d] = %f\n", i, i, result[i]);

　　最终测试输出界面如下图所示，输出的前面四行是打印平台信息，在上诉代码中并未涉及，具体查看附录里的完整代码。

三、附录完整的示例程序

1. 内核代码(HelloWorld.cl)

 // OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a, 
						  __global const float* b, 
						  __global float* result)
{
    // get index into global data array
    int iGID = get_global_id(0);
    
    // elements operation
    result[iGID] = a[iGID] * b[iGID];
}

2. 完整的主机程序(main.cpp)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>

#ifdef MAC
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif

char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength);

int main() 
{
	cl_int errNum;


	/******** 第一部分 选择OpenCL平台，创建一个上下文 ********/
	cl_uint numPlatforms;
	cl_platform_id *platformIds;
	cl_context context = 0;

	// 1. Select an OpenCL platform to run on.
	errNum = clGetPlatformIDs(0, NULL, &numPlatforms);						// 1. 获取OpenCL平台数目
	if (errNum != CL_SUCCESS || numPlatforms <= 0) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}
	printf("Platform Numbers: %d\n", numPlatforms);

	platformIds = (cl_platform_id *)malloc(
		sizeof(cl_platform_id) * numPlatforms);

	errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL);				// 2. 创建所有OpenCL平台  
	if (errNum != CL_SUCCESS) {
		perror("Failed to find any OpenCL platforms.");
		exit(1);
	}

	//------------------ 打印平台信息(Start) ------------------/
	// Extension data				
	size_t ext_size = 0;   

	//输出生产商的名字
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_NAME, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_NAME.");			
		exit(1);
	}	
	char *name = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_NAME, 	
		ext_size, name, NULL);				
	printf("Platform Name: %s\n", name);

	//供应商信息
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_VENDOR, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_VENDOR.");			
		exit(1);
	}	
	char *vendor = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_VENDOR, 	
		ext_size, vendor, NULL);				
	printf("Platform Vendor: %s\n", vendor);

	//最高支持的OpenCL版本
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_VERSION, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_VERSION.");			
		exit(1);
	}	
	char *version = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_VERSION, 	
		ext_size, version, NULL);				
	printf("Platform Version: %s\n", version);

	//只有两个值：full profile 和 embeded profile
	errNum = clGetPlatformInfo(platformIds[0], 			
							   CL_PLATFORM_PROFILE, 
							   0, NULL, &ext_size);	
	if(errNum < 0) {
		perror("Couldn't read CL_PLATFORM_PROFILE.");			
		exit(1);
	}	
	char *profile = (char*)malloc(ext_size);
	clGetPlatformInfo(platformIds[0], CL_PLATFORM_PROFILE, 	
		ext_size, profile, NULL);				
	printf("Platform Full Profile or Embeded Profile?: %s\n", profile);
	//------------------ 打印平台信息(End) ------------------/

	// 2. Create an OpenCL context on the platform.
	cl_context_properties contextProperties[] = {
		CL_CONTEXT_PLATFORM,
		(cl_context_properties)platformIds[0],								// 3. 选择第一个OpenCL平台  
		0
	};
	context = clCreateContextFromType(contextProperties, 					// 4. 尝试为一个GPU设备创建一个上下文 
									  CL_DEVICE_TYPE_GPU, 
									  NULL, NULL, &errNum);
	if (errNum != CL_SUCCESS) {
		perror("Could not create GPU context, trying CPU...");

		context = clCreateContextFromType(contextProperties, 				// 5. 尝试为一个CPU设备创建一个上下文  
			CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
		if (errNum != CL_SUCCESS) {
			perror("Failed to create an OpenCL GPU or CPU context.");
			exit(1);
		}
	}


	/******** 第二部分 选择设备，创建命令队列 ********/
	cl_device_id *devices;
	cl_device_id device = 0;
	cl_command_queue commandQueue = NULL;
	size_t deviceBufferSize = -1;

	// 3. Get the size of the device buffer.
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL,			// 1. 查询存储上下文所有可用设备ID所需要的缓冲区大小
							   &deviceBufferSize);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get context infomation.");
		exit(1);
	}
	if (deviceBufferSize <= 0) {
		perror("No devices available.");
		exit(1);
	}

	devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 					// 2. 获取上下文中所有可用设备
							  deviceBufferSize, devices, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Failed to get device ID.");
		exit(1);
	}

	// 4. Choose the first device
	commandQueue = clCreateCommandQueue(context,							// 3. 选择第一个设备，创建一个命令队列
										devices[0], 0, NULL);
	if (commandQueue == NULL) {
		perror("Failed to create commandQueue for device 0.");
		exit(1);
	}

	device = devices[0];
	delete [] devices;


	/******** 第三部分 读取OpenCL C语言，创建和构建程序对象 ********/
	cl_program program;
	size_t szKernelLength;			// Byte size of kernel code
	char* cSourceCL = NULL;         // Buffer to hold source for compilation

	// 5. Read the OpenCL kernel in from source file
	cSourceCL = oclLoadProgSource(										// 1. 从绝对路径读取HelloWorld.cl的源代码
		"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "", 
		&szKernelLength);
	if (cSourceCL == NULL){
        perror("Error in oclLoadProgSource\n");
        exit(1);
    }

	// 6. Create the program
    program = clCreateProgramWithSource(context, 1, 					// 2. 使用源代码创建程序对象
										(const char **)&cSourceCL, 
										&szKernelLength, &errNum);
    if (errNum != CL_SUCCESS) {
        perror("Error in clCreateProgramWithSource\n");
        exit(1);
    }

	// 7. Build the program with 'mad' Optimization option
    char* flags = "-cl-fast-relaxed-math";
    errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);		// 3. 编译内核源代码
    if (errNum != CL_SUCCESS) {
        perror("Error in clBuildProgram.\n");
        exit(1);
    }


	/******** 第四部分 创建内核和内存对象 ********/
	#define ARRAY_SIZE 10

	cl_kernel kernel = 0;
	cl_mem memObjects[3] = {0, 0, 0};

	float a[ARRAY_SIZE];
	float b[ARRAY_SIZE];
	float result[ARRAY_SIZE];

	// 8. Create the kernel
    kernel = clCreateKernel(program, "HelloOpenCL", NULL);			// 1. 创建内核对象
	if (kernel == NULL) {
		perror("Error in clCreateKernel.\n");
        exit(1);
	}

	// 9. Create memory objects
	for (int i = 0; i < ARRAY_SIZE; i++) {
		a[i] = (float)i;
		b[i] = (float)i;
	}

	memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY |		// 2. 创建内存对象
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   a, NULL);
	memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   b, NULL);
	memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
								   CL_MEM_COPY_HOST_PTR,
								   sizeof(float) * ARRAY_SIZE,
								   result, NULL);
	if (memObjects[0] == NULL || memObjects[1] == NULL || 
			memObjects[2] == NULL) {
		perror("Error in clCreateBuffer.\n");
        exit(1);
	}


	/******** 第五部分 执行内核 ********/
	size_t globalWorkSize[1] = { ARRAY_SIZE };
	size_t localWorkSize[1] = { 1 };

	// 10. Set the kernel arguments
	errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]);		// 1. 设置内核参数
    errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
    errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
	if (errNum != CL_SUCCESS) {
		perror("Error in clSetKernelArg.\n");
        exit(1);
	}

	// 11. Queue the kernel up for execution across the array
	errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL,			// 2. 执行内核排队
									globalWorkSize, localWorkSize,
									0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueNDRangeKernel.\n");
        exit(1);
	}

	// 12. Read the output buffer back to the Host
	errNum = clEnqueueReadBuffer(commandQueue, memObjects[2],				// 3. 读取运算结果到主机
								 CL_TRUE, 0,
								 ARRAY_SIZE * sizeof(float), result,
								 0, NULL, NULL);
	if (errNum != CL_SUCCESS) {
		perror("Error in clEnqueueReadBuffer.\n");
        exit(1);
	}


	/******** 第六部分 测试结果 ********/
	printf("\nTest: a * b = c\n\n");

	printf("Input numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);

	printf("\nOutput numbers:\n");
	for (int i = 0; i < ARRAY_SIZE; i++)
		printf("a[%d] * b[%d] = %f\n", i, i, result[i]);


	while(1);

	return 0;
}


//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename        program filename
//! @param cPreamble        code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength    returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
    // locals 
    FILE* pFileStream = NULL;
    size_t szSourceLength;

    // open the OpenCL source code file
    #ifdef _WIN32   // Windows version
        if(fopen_s(&pFileStream, cFilename, "rb") != 0) 
        {       
            return NULL;
        }
    #else           // Linux version
        pFileStream = fopen(cFilename, "rb");
        if(pFileStream == 0) 
        {       
            return NULL;
        }
    #endif

    size_t szPreambleLength = strlen(cPreamble);

    // get the length of the source code
    fseek(pFileStream, 0, SEEK_END); 
    szSourceLength = ftell(pFileStream);
    fseek(pFileStream, 0, SEEK_SET); 

    // allocate a buffer for the source code string and read it in
    char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1); 
    memcpy(cSourceString, cPreamble, szPreambleLength);
    if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
    {
        fclose(pFileStream);
        free(cSourceString);
        return 0;
    }

    // close the file and return the total length of the combined (preamble + source) string
    fclose(pFileStream);
    if(szFinalLength != 0)
    {
        *szFinalLength = szSourceLength + szPreambleLength;
    }
    cSourceString[szSourceLength + szPreambleLength] = '\0';

    return cSourceString;
}

Johnson Lu

发布了33 篇原创文章 · 获赞 8 · 访问量 10万+

私信关注

基于CUDA的OpenCL开发环境搭建与入门程序示例

一、搭建开发环境

1. 下载和安装CUDA SDK

2. 配置Visual C++ 2008

二、入门程序示例

1. OpenCL基础概念与框架

2. 第一个OpenCL程序

2.1 选择平台并创建上下文

2.2 选择设备并创建命令队列

2.3 创建和构建程序对象

2.4 创建内核和内存对象

2.5 执行内核

2.6 结果测试

三、附录完整的示例程序

1. 内核代码(HelloWorld.cl)

2. 完整的主机程序(main.cpp)

《Mali OpenCL SDK v1.1.0》教程樣例之二“程序模板”

《GNU_Make 中文手冊》筆記之二 ing

《Mali OpenCL SDK v1.1.0》教程樣例之四“圖像對象”

《Autotools - GNU Autoconf, Automake與Libtool實踐者指南》第二章

《Autotools - GNU Autoconf, Automake與Libtool實踐者指南》第三章

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

基于CUDA的OpenCL开发环境搭建与入门程序示例

一、 搭建开发环境

1. 下载和安装CUDA SDK

2. 配置Visual C++ 2008

二、 入门程序示例

1. OpenCL基础概念与框架

2. 第一个OpenCL程序

2.1 选择平台并创建上下文

2.2 选择设备并创建命令队列

2.3 创建和构建程序对象

2.4 创建内核和内存对象

2.5 执行内核

2.6 结果测试

三、 附录 完整的示例程序

1. 内核代码(HelloWorld.cl)

2. 完整的主机程序(main.cpp)

一、搭建开发环境

二、入门程序示例

三、附录完整的示例程序