CUDAExample-0-asyncAPI

原創

2020-02-25 23:01

標籤： CUDAExample

This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.

例程主要用於說明，gpu和cpu是可以同是執行的，即當gpu工作時，cpu也在工作，例子中先在gpu中計算加法，在cpu中進行計數，輸出gpu使用時間，以及cpu計數。

查看顯卡硬件信息

    int devID;
    cudaDeviceProp deviceProps; //定義結構體類型的變量，保存硬件信息

    // This will pick the best possible CUDA capable device
    devID = findCudaDevice(argc, (const char **)argv);   //尋找合適的設備，只選一個

    // get device name
    checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
    printf("CUDA device [%s]\n", deviceProps.name);

運行結果：
CUDA device [GeForce GTX 980]

其中函數findCudaDevice()來自庫文件helper_cuda.h,選擇硬件，優先選擇用戶指定的設備，如果用戶沒有指定，則選擇Gflops（每秒千兆次浮點運算）最高的一個硬件，返回設備ID號。

主要代碼

    // create cuda event handles
    cudaEvent_t start, stop;
    checkCudaErrors(cudaEventCreate(&start));
    checkCudaErrors(cudaEventCreate(&stop));

    StopWatchInterface *timer = NULL;
    sdkCreateTimer(&timer);
    sdkResetTimer(&timer);

    checkCudaErrors(cudaDeviceSynchronize());
    float gpu_time = 0.0f;

    // asynchronously issue work to the GPU (all to stream 0)
    sdkStartTimer(&timer);
    cudaEventRecord(start, 0);
    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu並行計算加法核函數
    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
    cudaEventRecord(stop, 0);
    sdkStopTimer(&timer);

    // have CPU do some work while waiting for stage 1 to finish
    unsigned long int counter=0;

    while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu結束計算之間cpu進行計數
    {
        counter++;
    }

    checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop));

    // print the cpu and gpu times 輸出
    printf("time spent executing by the GPU: %.2f\n", gpu_time);
    printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer));
    printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);

運行結果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish

從結果中可以看出cpu和gpu可以同時工作。

End

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CUDAExample-0-asyncAPI

linux安裝cuda和cudnn

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

內存尋址優化

CUDAExample-0-clock

Linux系統動態鏈接庫和靜態鏈接庫CMake的使用方法

統計-均值，期望，方差，協方差，協方差矩陣

上海復旦大學吳立德教授深度學習課程五

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結