標籤: CUDAExample
This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.
例程主要用於說明,gpu和cpu是可以同是執行的,即當gpu工作時,cpu也在工作,例子中先在gpu中計算加法,在cpu中進行計數,輸出gpu使用時間,以及cpu計數。
查看顯卡硬件信息
int devID;
cudaDeviceProp deviceProps; //定義結構體類型的變量,保存硬件信息
// This will pick the best possible CUDA capable device
devID = findCudaDevice(argc, (const char **)argv); //尋找合適的設備,只選一個
// get device name
checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
printf("CUDA device [%s]\n", deviceProps.name);
運行結果:
CUDA device [GeForce GTX 980]
其中函數findCudaDevice()來自庫文件helper_cuda.h,選擇硬件,優先選擇用戶指定的設備,如果用戶沒有指定,則選擇Gflops(每秒千兆次浮點運算)最高的一個硬件,返回設備ID號。
主要代碼
// create cuda event handles
cudaEvent_t start, stop;
checkCudaErrors(cudaEventCreate(&start));
checkCudaErrors(cudaEventCreate(&stop));
StopWatchInterface *timer = NULL;
sdkCreateTimer(&timer);
sdkResetTimer(&timer);
checkCudaErrors(cudaDeviceSynchronize());
float gpu_time = 0.0f;
// asynchronously issue work to the GPU (all to stream 0)
sdkStartTimer(&timer);
cudaEventRecord(start, 0);
cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu並行計算加法核函數
cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
cudaEventRecord(stop, 0);
sdkStopTimer(&timer);
// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;
while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu結束計算之間cpu進行計數
{
counter++;
}
checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop));
// print the cpu and gpu times 輸出
printf("time spent executing by the GPU: %.2f\n", gpu_time);
printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer));
printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);
運行結果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish
從結果中可以看出cpu和gpu可以同時工作。
End