CUDAExample-0-clock

原創

2020-06-23 03:47

標籤： CUDAExample

例程說明：

例程的每一個Block進行歸併排序查找處最小值。由於BLocks間並行執行時沒有同步機制，實現過程中例程會測量每一個Block所耗費的時間，同時實踐過程使用了動態分配share memory的技巧，關於share memory使用參考另一篇博客《CUDA核函數share memory》。

歸併排序查抄最小值方法，參考下圖：

數組中有2n個數據，每一個Block開闢n個線程（n小於block線程的最大開闢線程數），每一個線程進行交換，挑選出最小值，進行二分，再一次挑選，直到剩下最後一個值就是需要的最小值，同樣適用於查找最大值。實現過程中，塊內部同步函數不可缺少。

實現代碼核函數

    // Copy input.
    shared[tid] = input[tid];
    shared[tid + blockDim.x] = input[tid + blockDim.x];

    // Perform reduction to find minimum.
    for (int d = blockDim.x; d > 0; d /= 2)
    {
        __syncthreads();

        if (tid < d)
        {
            float f0 = shared[tid];
            float f1 = shared[tid + d];

            if (f1 < f0)
            {
                shared[tid] = f1;
            }
        }
    }

    // Write result.
    if (tid == 0) output[bid] = shared[0];

    __syncthreads();

例程中實現了保存Block執行的時間，並將開始時刻和結束時刻存儲在顯存中。核函數中使用的計時函數爲：clock

if (tid == 0) timer[bid] = clock();存儲開始時刻
if (tid == 0) timer[bid+gridDim.x] = clock();存儲結束時刻

CPU 對計時數據處理過程：

    clock_t timePerBlock;
    for (int i = 1; i < NUM_BLOCKS; i++)
    {
        timePerBlock = timer[NUM_BLOCKS + i] - timer[i];
        printf("Block %d clocks = %d\n", i,(int)(timePerBlock));
    }

    // Compute the difference between the last block end and the first block start.
    clock_t minStart = timer[0];
    clock_t maxEnd = timer[NUM_BLOCKS];
    for (int i = 1; i < NUM_BLOCKS; i++)
    {
        minStart = timer[i] < minStart ? timer[i] : minStart;
        maxEnd = timer[NUM_BLOCKS+i] > maxEnd ? timer[NUM_BLOCKS+i] : maxEnd;
    }

    printf("\n minStart clocks = %d\n", (int)(minStart));
    printf(" maxEnd clocks = %d\n", (int)(maxEnd));
    printf("\n Total clocks = %d\n", (int)(maxEnd - minStart));   // CLOCKS_PER_SEC

end

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CUDAExample-0-clock

標籤： CUDAExample

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

內存尋址優化

CUDAExample-0-clock

Linux系統動態鏈接庫和靜態鏈接庫CMake的使用方法

統計-均值，期望，方差，協方差，協方差矩陣

上海復旦大學吳立德教授深度學習課程五

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結