c++ 和cuda混合編程 VS2015 C++ 調用 cuda

1 新建一個C++ 項目

2 右鍵添加一個cuda C/C++ file

3 添加下面 lib 庫

右鍵項目->屬性->鏈接器->輸入->附加依賴項目：

cudart_static.lib
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib

4 關鍵一步添加自定義生成

右鍵項目->屬性->生成依賴項->自定義生成：

5. cuda.cu文件內容

#include <stdio.h>
#include <random>


#include <cuda_runtime.h>
#include <device_launch_parameters.h>

__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
	int i = blockDim.x * blockIdx.x + threadIdx.x;

	if (i < numElements)
	{
		C[i] = A[i] + B[i];
	}
}


int mainCuda(void)
{
	// Error code to check return values for CUDA calls
	cudaError_t err = cudaSuccess;

	// Print the vector length to be used, and compute its size
	int numElements = 50000;
	size_t size = numElements * sizeof(float);
	printf("[Vector addition of %d elements]\n", numElements);

	// Allocate the host input vector A
	float *h_A = (float *)malloc(size);

	// Allocate the host input vector B
	float *h_B = (float *)malloc(size);

	// Allocate the host output vector C
	float *h_C = (float *)malloc(size);

	
	// Initialize the host input vectors
	for (int i = 0; i < numElements; ++i)
	{
		h_A[i] = rand() / (float)RAND_MAX;
		h_B[i] = rand() / (float)RAND_MAX;
	}

	// Allocate the device input vector A
	float *d_A = NULL;
	err = cudaMalloc((void **)&d_A, size);

	// Allocate the device input vector B
	float *d_B = NULL;
	err = cudaMalloc((void **)&d_B, size);

	// Allocate the device output vector C
	float *d_C = NULL;
	err = cudaMalloc((void **)&d_C, size);


	// Copy the host input vectors A and B in host memory to the device input vectors in
	// device memory
	printf("Copy input data from the host memory to the CUDA device\n");
	err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

	err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

	// Launch the Vector Add CUDA Kernel
	int threadsPerBlock = 256;
	int blocksPerGrid = (numElements + threadsPerBlock - 1) / threadsPerBlock;
	printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);
	vectorAdd <<<blocksPerGrid, threadsPerBlock >>>(d_A, d_B, d_C, numElements);
	err = cudaGetLastError();

	// Copy the device result vector in device memory to the host result vector
	// in host memory.
	printf("Copy output data from the CUDA device to the host memory\n");
	err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

	// Verify that the result vector is correct
	for (int i = 0; i < numElements; ++i)
	{
		if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5)
		{
			fprintf(stderr, "Result verification failed at element %d!\n", i);
			exit(EXIT_FAILURE);
		}
	}

	printf("Test PASSED\n");

	// Free device global memory
	err = cudaFree(d_A);

	err = cudaFree(d_B);

	err = cudaFree(d_C);


	// Free host memory
	free(h_A);
	free(h_B);
	free(h_C);

	printf("Done\n");
	return 0;
}

6. Main.cpp 文件內容

#include <stdio.h>
#include <stdlib.h>



extern int mainCuda(void);

void main()
{
	mainCuda();
	system("pause");
}

c++ 和cuda混合編程 VS2015 C++ 調用 cuda

1 新建一個C++ 項目

2 右鍵添加一個cuda C/C++ file

3 添加下面 lib 庫

右鍵項目->屬性->鏈接器->輸入->附加依賴項目：

4 關鍵一步添加自定義生成

右鍵項目->屬性->生成依賴項->自定義生成：

5. cuda.cu文件內容

6. Main.cpp 文件內容

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

11 opencv python 圖像閾值

Opengl 基本狀態管理

14 opencv python 圖像梯度

Opengl 簡單實例

Qml C++ 混合編程 Qml C++函數相互調用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

c++ 和cuda混合編程 VS2015 C++ 調用 cuda

1 新建一個C++ 項目

2 右鍵添加一個cuda C/C++ file

3 添加下面 lib 庫

右鍵項目->屬性->鏈接器->輸入->附加依賴項目：

4 關鍵一步 添加自定義生成

右鍵項目->屬性->生成依賴項->自定義生成：

5. cuda.cu文件內容

6. Main.cpp 文件內容

4 關鍵一步添加自定義生成