CUDAExample-0-asyncAPI

標籤: CUDAExample


This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.

例程主要用於說明,gpu和cpu是可以同是執行的,即當gpu工作時,cpu也在工作,例子中先在gpu中計算加法,在cpu中進行計數,輸出gpu使用時間,以及cpu計數。

查看顯卡硬件信息

    int devID;
    cudaDeviceProp deviceProps; //定義結構體類型的變量,保存硬件信息

    // This will pick the best possible CUDA capable device
    devID = findCudaDevice(argc, (const char **)argv);   //尋找合適的設備,只選一個

    // get device name
    checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
    printf("CUDA device [%s]\n", deviceProps.name);

運行結果:
CUDA device [GeForce GTX 980]

其中函數findCudaDevice()來自庫文件helper_cuda.h,選擇硬件,優先選擇用戶指定的設備,如果用戶沒有指定,則選擇Gflops(每秒千兆次浮點運算)最高的一個硬件,返回設備ID號。

主要代碼

    // create cuda event handles
    cudaEvent_t start, stop;
    checkCudaErrors(cudaEventCreate(&start));
    checkCudaErrors(cudaEventCreate(&stop));

    StopWatchInterface *timer = NULL;
    sdkCreateTimer(&timer);
    sdkResetTimer(&timer);

    checkCudaErrors(cudaDeviceSynchronize());
    float gpu_time = 0.0f;

    // asynchronously issue work to the GPU (all to stream 0)
    sdkStartTimer(&timer);
    cudaEventRecord(start, 0);
    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu並行計算加法核函數
    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
    cudaEventRecord(stop, 0);
    sdkStopTimer(&timer);

    // have CPU do some work while waiting for stage 1 to finish
    unsigned long int counter=0;

    while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu結束計算之間cpu進行計數
    {
        counter++;
    }

    checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop));

    // print the cpu and gpu times 輸出
    printf("time spent executing by the GPU: %.2f\n", gpu_time);
    printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer));
    printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);

運行結果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish

從結果中可以看出cpu和gpu可以同時工作。


End

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章