參考資料:《詳細程序註解學OpenCL一 環境配置和入門程序》、《VS2010 NVIDIA OpenCL 開發環境配置》
一、 搭建開發環境
1. 下載和安裝CUDA SDK
下載路徑:https://developer.nvidia.com/cuda-downloads ;
如果默認安裝路徑的話,是在:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5。打開這個目錄會發現裏面有include和lib文件夾,這就是我們需要在Visual
C++ 2008中配置的目錄。
2. 配置Visual C++ 2008
a. 打開Visual C++ 2008,新建一個空項目;
b. 右鍵點擊界面左側“源文件”文件夾,選擇“添加”-->"新建項",建立一個空的“main.cpp”文件;(做這一步是爲了讓工程的“屬性頁”裏的“配置屬性”裏出現“C/C++”選項,以配置路徑。)
c.右鍵點擊項目文件,選擇“屬性”;
d. 配置屬性頁。
(a).
“配置屬性”--> "C/C++" --> "常規" ,在右邊“附加包含目錄”裏添加: “ C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include “,如下圖。
(b).“配置屬性”--> "鏈接器" --> "常規",在右邊"附加庫目錄"裏添加:" C:\Program
Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\Win32",如果是64位系統可以是: "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\x64" ,如下圖。
(c). “配置屬性”--> "鏈接器" --> "輸入" ,在右邊"附加依賴項"裏添加:OpenCL.lib,如下圖。
二、 入門程序示例
1. OpenCL基礎概念與框架
OpenCL支持大量不同的應用,無論哪一種,面向異構平臺的應用都必須完成的步驟有:
a.發現構成異構系統的組建;
b.探查這些組件的特徵,使軟件能夠適應不同硬件單元的特定特性;
c.創建將在平臺上運行的指令塊(內核);
d.建立和管理計算中涉及的內存對象;
e.在系統正確的組件上按正確的順序執行內核;
f.收集最終的內核。
這些步驟通過OpenCL中的一系列API再加上一個面向內核的編程環境來完成。我們把問題分解爲以下模型:
> 平臺模型(platform model):異構系統的描述;
> 執行模型(execution model):指令流在異構平臺上執行的抽象表示;
> 內存模型(memory
model):OpenCL中的內存區域集合以及一個OpenCL計算期間這些內存如何交互;
> 編程模型(programming
model):程序員設計算法來實現一個應用時的高層描述。
OpenCL的框架可以劃分爲以下組成部分:
>主機編程
平臺API
- 查詢計算設備
- 創建上下文(Contexts)
運行時API
- 創建上下文相關的內存對象
- 編譯和創建內核編程對象
- 發出命令到命令隊列
- 命令同步
- 清除OpenCL資源
> 內核
編程語言
- 帶一些限制和擴展的C代碼
應用在OpenCL框架中的基本工作流示意圖如下。
2. 第一個OpenCL程序
我們的示例程序將完成以下操作:
a.在第一個可用平臺上創建OpenCL上下文;
b. 在第一個可用設備上創建命令隊列;
c.加載一個內核文件(HelloWorld.cl),並將它構建到程序對象中;
d. 爲內核函數hello_kernel()創建一個內核對象;
e. 爲內核參數創建內存對象;
f. 將待執行的內核排隊;
g. 將內核結果讀回結果緩衝區。
2.1 選擇平臺並創建上下文
一個系統上可以有多個OpenCL實現,創建OpenCL程序的第一步是選擇一個平臺。創建平臺之後,在平臺上創建一個上下文,具體代碼如下:
/******** 第一部分 選擇OpenCL平臺,創建一個上下文 ********/
cl_uint numPlatforms;
cl_platform_id *platformIds;
cl_context context = 0;
// 1. Select an OpenCL platform to run on.
errNum = clGetPlatformIDs(0, NULL, &numPlatforms); // 1. 獲取OpenCL平臺數目
if (errNum != CL_SUCCESS || numPlatforms <= 0) {
perror("Failed to find any OpenCL platforms.");
exit(1);
}
printf("Platform Numbers: %d\n", numPlatforms);
platformIds = (cl_platform_id *)malloc(
sizeof(cl_platform_id) * numPlatforms);
errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL); // 2. 創建所有OpenCL平臺
if (errNum != CL_SUCCESS) {
perror("Failed to find any OpenCL platforms.");
exit(1);
}
// 2. Create an OpenCL context on the platform.
cl_context_properties contextProperties[] = {
CL_CONTEXT_PLATFORM,
(cl_context_properties)platformIds[0], // 3. 選擇第一個OpenCL平臺
0
};
context = clCreateContextFromType(contextProperties, // 4. 嘗試爲一個GPU設備創建一個上下文
CL_DEVICE_TYPE_GPU,
NULL, NULL, &errNum);
if (errNum != CL_SUCCESS) {
perror("Could not create GPU context, trying CPU...");
context = clCreateContextFromType(contextProperties, // 5. 嘗試爲一個CPU設備創建一個上下文
CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
if (errNum != CL_SUCCESS) {
perror("Failed to create an OpenCL GPU or CPU context.");
exit(1);
}
}
2.2 選擇設備並創建命令隊列
選擇平臺並創建一個上下文之後,下一步是選擇一個設備,並創建一個命令隊列,具體代碼如下:
/******** 第二部分 選擇設備,創建命令隊列 ********/
cl_device_id *devices;
cl_device_id device = 0;
cl_command_queue commandQueue = NULL;
size_t deviceBufferSize = -1;
// 3. Get the size of the device buffer.
errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, // 1. 查詢存儲上下文所有可用設備ID所需要的緩衝區大小
&deviceBufferSize);
if (errNum != CL_SUCCESS) {
perror("Failed to get context infomation.");
exit(1);
}
if (deviceBufferSize <= 0) {
perror("No devices available.");
exit(1);
}
devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, // 2. 獲取上下文中所有可用設備
deviceBufferSize, devices, NULL);
if (errNum != CL_SUCCESS) {
perror("Failed to get device ID.");
exit(1);
}
// 4. Choose the first device
commandQueue = clCreateCommandQueue(context, // 3. 選擇第一個設備,創建一個命令隊列
devices[0], 0, NULL);
if (commandQueue == NULL) {
perror("Failed to create commandQueue for device 0.");
exit(1);
}
device = devices[0];
delete [] devices;
2.3 創建和構建程序對象
OpenCL的下一步是從HelloWorld.cl文件中加載OpenCL內核源代碼,由它創建一個程序對象。該程序對象用內核源代碼加載,然後進行編譯,從而在上下文相關聯的設備上執行。HelloWorld.cl中的OpenCL內核代碼如下:
// OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a,
__global const float* b,
__global float* result)
{
// get index into global data array
int iGID = get_global_id(0);
// elements operation
result[iGID] = a[iGID] * b[iGID];
}
創建和構建程序對象的源碼如下:
/******** 第三部分 讀取OpenCL C語言,創建和構建程序對象 ********/
cl_program program;
size_t szKernelLength; // Byte size of kernel code
char* cSourceCL = NULL; // Buffer to hold source for compilation
// 5. Read the OpenCL kernel in from source file
cSourceCL = oclLoadProgSource( // 1. 從絕對路徑讀取HelloWorld.cl的源代碼
"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "",
&szKernelLength);
if (cSourceCL == NULL){
perror("Error in oclLoadProgSource\n");
exit(1);
}
// 6. Create the program
program = clCreateProgramWithSource(context, 1, // 2. 使用源代碼創建程序對象
(const char **)&cSourceCL,
&szKernelLength, &errNum);
if (errNum != CL_SUCCESS) {
perror("Error in clCreateProgramWithSource\n");
exit(1);
}
// 7. Build the program with 'mad' Optimization option
char* flags = "-cl-fast-relaxed-math";
errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL); // 3. 編譯內核源代碼
if (errNum != CL_SUCCESS) {
perror("Error in clBuildProgram.\n");
exit(1);
}
讀取內核源代碼的函數來自Nvidia官方網站的實例程序,具體代碼如下:
//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename program filename
//! @param cPreamble code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
// locals
FILE* pFileStream = NULL;
size_t szSourceLength;
// open the OpenCL source code file
#ifdef _WIN32 // Windows version
if(fopen_s(&pFileStream, cFilename, "rb") != 0)
{
return NULL;
}
#else // Linux version
pFileStream = fopen(cFilename, "rb");
if(pFileStream == 0)
{
return NULL;
}
#endif
size_t szPreambleLength = strlen(cPreamble);
// get the length of the source code
fseek(pFileStream, 0, SEEK_END);
szSourceLength = ftell(pFileStream);
fseek(pFileStream, 0, SEEK_SET);
// allocate a buffer for the source code string and read it in
char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1);
memcpy(cSourceString, cPreamble, szPreambleLength);
if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
{
fclose(pFileStream);
free(cSourceString);
return 0;
}
// close the file and return the total length of the combined (preamble + source) string
fclose(pFileStream);
if(szFinalLength != 0)
{
*szFinalLength = szSourceLength + szPreambleLength;
}
cSourceString[szSourceLength + szPreambleLength] = '\0';
return cSourceString;
}
2.4 創建內核和內存對象
下面是創建內核對象,並將其編譯到程序對象中。另外,分配數組,將數組數據複製到爲內存對象分配的存儲空間中。具體代碼如下:
/******** 第四部分 創建內核和內存對象 ********/
#define ARRAY_SIZE 10
cl_kernel kernel = 0;
cl_mem memObjects[3] = {0, 0, 0};
float a[ARRAY_SIZE];
float b[ARRAY_SIZE];
float result[ARRAY_SIZE];
// 8. Create the kernel
kernel = clCreateKernel(program, "HelloOpenCL", NULL); // 1. 創建內核對象
if (kernel == NULL) {
perror("Error in clCreateKernel.\n");
exit(1);
}
// 9. Create memory objects
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = (float)i;
b[i] = (float)i;
}
memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | // 2. 創建內存對象
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
a, NULL);
memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
b, NULL);
memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
result, NULL);
if (memObjects[0] == NULL || memObjects[1] == NULL ||
memObjects[2] == NULL) {
perror("Error in clCreateBuffer.\n");
exit(1);
}
2.5 執行內核
創建了內核和內存對象之後,可以將要執行的內核排隊。首先是建立內核參數,之後利用命令隊列使將在設備上執行的內核排隊。執行內核排隊並不表示這個內核會被立即執行。內核執行會被放在命令隊列中,以後再由設備消費。具體代碼如下:
/******** 第五部分 執行內核 ********/
size_t globalWorkSize[1] = { ARRAY_SIZE };
size_t localWorkSize[1] = { 1 };
// 10. Set the kernel arguments
errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]); // 1. 設置內核參數
errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
if (errNum != CL_SUCCESS) {
perror("Error in clSetKernelArg.\n");
exit(1);
}
// 11. Queue the kernel up for execution across the array
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, // 2. 執行內核排隊
globalWorkSize, localWorkSize,
0, NULL, NULL);
if (errNum != CL_SUCCESS) {
perror("Error in clEnqueueNDRangeKernel.\n");
exit(1);
}
// 12. Read the output buffer back to the Host
errNum = clEnqueueReadBuffer(commandQueue, memObjects[2], // 3. 讀取運算結果到主機
CL_TRUE, 0,
ARRAY_SIZE * sizeof(float), result,
0, NULL, NULL);
if (errNum != CL_SUCCESS) {
perror("Error in clEnqueueReadBuffer.\n");
exit(1);
}
2.6 結果測試
本示例程序的目的是,計算兩個數組中對應元素相乘的結果,其測試用例與結果如下: /******** 第六部分 測試結果 ********/
printf("\nTest: a * b = c\n\n");
printf("Input numbers:\n");
for (int i = 0; i < ARRAY_SIZE; i++)
printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);
printf("\nOutput numbers:\n");
for (int i = 0; i < ARRAY_SIZE; i++)
printf("a[%d] * b[%d] = %f\n", i, i, result[i]);
最終測試輸出界面如下圖所示,輸出的前面四行是打印平臺信息,在上訴代碼中並未涉及,具體查看附錄裏的完整代碼。
三、 附錄 完整的示例程序
1. 內核代碼(HelloWorld.cl)
// OpenCL Kernel Function
__kernel void HelloOpenCL(__global const float* a,
__global const float* b,
__global float* result)
{
// get index into global data array
int iGID = get_global_id(0);
// elements operation
result[iGID] = a[iGID] * b[iGID];
}
2. 完整的主機程序(main.cpp)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <iostream>
#ifdef MAC
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength);
int main()
{
cl_int errNum;
/******** 第一部分 選擇OpenCL平臺,創建一個上下文 ********/
cl_uint numPlatforms;
cl_platform_id *platformIds;
cl_context context = 0;
// 1. Select an OpenCL platform to run on.
errNum = clGetPlatformIDs(0, NULL, &numPlatforms); // 1. 獲取OpenCL平臺數目
if (errNum != CL_SUCCESS || numPlatforms <= 0) {
perror("Failed to find any OpenCL platforms.");
exit(1);
}
printf("Platform Numbers: %d\n", numPlatforms);
platformIds = (cl_platform_id *)malloc(
sizeof(cl_platform_id) * numPlatforms);
errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL); // 2. 創建所有OpenCL平臺
if (errNum != CL_SUCCESS) {
perror("Failed to find any OpenCL platforms.");
exit(1);
}
//------------------ 打印平臺信息(Start) ------------------/
// Extension data
size_t ext_size = 0;
//輸出生產商的名字
errNum = clGetPlatformInfo(platformIds[0],
CL_PLATFORM_NAME,
0, NULL, &ext_size);
if(errNum < 0) {
perror("Couldn't read CL_PLATFORM_NAME.");
exit(1);
}
char *name = (char*)malloc(ext_size);
clGetPlatformInfo(platformIds[0], CL_PLATFORM_NAME,
ext_size, name, NULL);
printf("Platform Name: %s\n", name);
//供應商信息
errNum = clGetPlatformInfo(platformIds[0],
CL_PLATFORM_VENDOR,
0, NULL, &ext_size);
if(errNum < 0) {
perror("Couldn't read CL_PLATFORM_VENDOR.");
exit(1);
}
char *vendor = (char*)malloc(ext_size);
clGetPlatformInfo(platformIds[0], CL_PLATFORM_VENDOR,
ext_size, vendor, NULL);
printf("Platform Vendor: %s\n", vendor);
//最高支持的OpenCL版本
errNum = clGetPlatformInfo(platformIds[0],
CL_PLATFORM_VERSION,
0, NULL, &ext_size);
if(errNum < 0) {
perror("Couldn't read CL_PLATFORM_VERSION.");
exit(1);
}
char *version = (char*)malloc(ext_size);
clGetPlatformInfo(platformIds[0], CL_PLATFORM_VERSION,
ext_size, version, NULL);
printf("Platform Version: %s\n", version);
//只有兩個值:full profile 和 embeded profile
errNum = clGetPlatformInfo(platformIds[0],
CL_PLATFORM_PROFILE,
0, NULL, &ext_size);
if(errNum < 0) {
perror("Couldn't read CL_PLATFORM_PROFILE.");
exit(1);
}
char *profile = (char*)malloc(ext_size);
clGetPlatformInfo(platformIds[0], CL_PLATFORM_PROFILE,
ext_size, profile, NULL);
printf("Platform Full Profile or Embeded Profile?: %s\n", profile);
//------------------ 打印平臺信息(End) ------------------/
// 2. Create an OpenCL context on the platform.
cl_context_properties contextProperties[] = {
CL_CONTEXT_PLATFORM,
(cl_context_properties)platformIds[0], // 3. 選擇第一個OpenCL平臺
0
};
context = clCreateContextFromType(contextProperties, // 4. 嘗試爲一個GPU設備創建一個上下文
CL_DEVICE_TYPE_GPU,
NULL, NULL, &errNum);
if (errNum != CL_SUCCESS) {
perror("Could not create GPU context, trying CPU...");
context = clCreateContextFromType(contextProperties, // 5. 嘗試爲一個CPU設備創建一個上下文
CL_DEVICE_TYPE_GPU, NULL, NULL, &errNum);
if (errNum != CL_SUCCESS) {
perror("Failed to create an OpenCL GPU or CPU context.");
exit(1);
}
}
/******** 第二部分 選擇設備,創建命令隊列 ********/
cl_device_id *devices;
cl_device_id device = 0;
cl_command_queue commandQueue = NULL;
size_t deviceBufferSize = -1;
// 3. Get the size of the device buffer.
errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, // 1. 查詢存儲上下文所有可用設備ID所需要的緩衝區大小
&deviceBufferSize);
if (errNum != CL_SUCCESS) {
perror("Failed to get context infomation.");
exit(1);
}
if (deviceBufferSize <= 0) {
perror("No devices available.");
exit(1);
}
devices = new cl_device_id[deviceBufferSize/sizeof(cl_device_id)];
errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, // 2. 獲取上下文中所有可用設備
deviceBufferSize, devices, NULL);
if (errNum != CL_SUCCESS) {
perror("Failed to get device ID.");
exit(1);
}
// 4. Choose the first device
commandQueue = clCreateCommandQueue(context, // 3. 選擇第一個設備,創建一個命令隊列
devices[0], 0, NULL);
if (commandQueue == NULL) {
perror("Failed to create commandQueue for device 0.");
exit(1);
}
device = devices[0];
delete [] devices;
/******** 第三部分 讀取OpenCL C語言,創建和構建程序對象 ********/
cl_program program;
size_t szKernelLength; // Byte size of kernel code
char* cSourceCL = NULL; // Buffer to hold source for compilation
// 5. Read the OpenCL kernel in from source file
cSourceCL = oclLoadProgSource( // 1. 從絕對路徑讀取HelloWorld.cl的源代碼
"C:/Users/xxx/Desktop/OpenCL/HelloOpenCL.cl", "",
&szKernelLength);
if (cSourceCL == NULL){
perror("Error in oclLoadProgSource\n");
exit(1);
}
// 6. Create the program
program = clCreateProgramWithSource(context, 1, // 2. 使用源代碼創建程序對象
(const char **)&cSourceCL,
&szKernelLength, &errNum);
if (errNum != CL_SUCCESS) {
perror("Error in clCreateProgramWithSource\n");
exit(1);
}
// 7. Build the program with 'mad' Optimization option
char* flags = "-cl-fast-relaxed-math";
errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL); // 3. 編譯內核源代碼
if (errNum != CL_SUCCESS) {
perror("Error in clBuildProgram.\n");
exit(1);
}
/******** 第四部分 創建內核和內存對象 ********/
#define ARRAY_SIZE 10
cl_kernel kernel = 0;
cl_mem memObjects[3] = {0, 0, 0};
float a[ARRAY_SIZE];
float b[ARRAY_SIZE];
float result[ARRAY_SIZE];
// 8. Create the kernel
kernel = clCreateKernel(program, "HelloOpenCL", NULL); // 1. 創建內核對象
if (kernel == NULL) {
perror("Error in clCreateKernel.\n");
exit(1);
}
// 9. Create memory objects
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = (float)i;
b[i] = (float)i;
}
memObjects[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | // 2. 創建內存對象
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
a, NULL);
memObjects[1] = clCreateBuffer(context, CL_MEM_READ_ONLY |
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
b, NULL);
memObjects[2] = clCreateBuffer(context, CL_MEM_READ_WRITE |
CL_MEM_COPY_HOST_PTR,
sizeof(float) * ARRAY_SIZE,
result, NULL);
if (memObjects[0] == NULL || memObjects[1] == NULL ||
memObjects[2] == NULL) {
perror("Error in clCreateBuffer.\n");
exit(1);
}
/******** 第五部分 執行內核 ********/
size_t globalWorkSize[1] = { ARRAY_SIZE };
size_t localWorkSize[1] = { 1 };
// 10. Set the kernel arguments
errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &memObjects[0]); // 1. 設置內核參數
errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &memObjects[1]);
errNum |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &memObjects[2]);
if (errNum != CL_SUCCESS) {
perror("Error in clSetKernelArg.\n");
exit(1);
}
// 11. Queue the kernel up for execution across the array
errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, // 2. 執行內核排隊
globalWorkSize, localWorkSize,
0, NULL, NULL);
if (errNum != CL_SUCCESS) {
perror("Error in clEnqueueNDRangeKernel.\n");
exit(1);
}
// 12. Read the output buffer back to the Host
errNum = clEnqueueReadBuffer(commandQueue, memObjects[2], // 3. 讀取運算結果到主機
CL_TRUE, 0,
ARRAY_SIZE * sizeof(float), result,
0, NULL, NULL);
if (errNum != CL_SUCCESS) {
perror("Error in clEnqueueReadBuffer.\n");
exit(1);
}
/******** 第六部分 測試結果 ********/
printf("\nTest: a * b = c\n\n");
printf("Input numbers:\n");
for (int i = 0; i < ARRAY_SIZE; i++)
printf("a[%d] = %f, b[%d] = %f\n", i, a[i], i, b[i]);
printf("\nOutput numbers:\n");
for (int i = 0; i < ARRAY_SIZE; i++)
printf("a[%d] * b[%d] = %f\n", i, i, result[i]);
while(1);
return 0;
}
//////////////////////////////////////////////////////////////////////////////
//! Loads a Program file and prepends the cPreamble to the code.
//!
//! @return the source string if succeeded, 0 otherwise
//! @param cFilename program filename
//! @param cPreamble code that is prepended to the loaded file, typically a set of #defines or a header
//! @param szFinalLength returned length of the code string
//////////////////////////////////////////////////////////////////////////////
char* oclLoadProgSource(const char* cFilename, const char* cPreamble, size_t* szFinalLength)
{
// locals
FILE* pFileStream = NULL;
size_t szSourceLength;
// open the OpenCL source code file
#ifdef _WIN32 // Windows version
if(fopen_s(&pFileStream, cFilename, "rb") != 0)
{
return NULL;
}
#else // Linux version
pFileStream = fopen(cFilename, "rb");
if(pFileStream == 0)
{
return NULL;
}
#endif
size_t szPreambleLength = strlen(cPreamble);
// get the length of the source code
fseek(pFileStream, 0, SEEK_END);
szSourceLength = ftell(pFileStream);
fseek(pFileStream, 0, SEEK_SET);
// allocate a buffer for the source code string and read it in
char* cSourceString = (char *)malloc(szSourceLength + szPreambleLength + 1);
memcpy(cSourceString, cPreamble, szPreambleLength);
if (fread((cSourceString) + szPreambleLength, szSourceLength, 1, pFileStream) != 1)
{
fclose(pFileStream);
free(cSourceString);
return 0;
}
// close the file and return the total length of the combined (preamble + source) string
fclose(pFileStream);
if(szFinalLength != 0)
{
*szFinalLength = szSourceLength + szPreambleLength;
}
cSourceString[szSourceLength + szPreambleLength] = '\0';
return cSourceString;
}