CUDA學習(6)Kernel的加載-threadIdx

剛開始學習CUDA的時候,對kernel加載的計算idx一直很模糊,threadIdx.x,blockx.x,blockDim,gridDim等一直分不清。經過查閱各方資料,特在此做個整理,表述一下個人理解。
1. Grid,Block,Thread三關係
Grid,Block,Thread三關係
從圖中我們可以看出,一個Grid裏可以包含多個Block,一個Block裏包含多個Thread。這三者的組成方式都可以是一維、二維、三維的。在CUDA程序中每個線程的ThreadIdx在任何時刻都是唯一的。
2. 維度
啓動kernel時,需要制定gridsize和blocksize
dim3 gridsize(x,y,z)
dim3 blocksize(x,y,z)
blockDim.x,blockDim.y,blockDim.z分別代表Block 在x,y,z三個方向的深度。Dim 數從1開始標,線程數Idx從0開始標。
3. 1D、2D、3D模式
3.1 1D模式
grid 1D,Block 1D(grid劃分成1維,block劃分成1維)
加載方式 int idx = blockIdx.x *blockDim.x + threadIdx.x;
Kernel<<< numBlock,threadsPerBlock>>>(argv)

grid 1D,Block 2D (grid劃分成1維,block劃分成2維)
int idx = blockIdx.x * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x;
dim3 dimBlock(x,y)
Kernel<<< numBlock,dimBlock>>>(argv)
以此爲例,因爲grid是一維的,所以blockIdx.x(從0開始標號)就是一個grid中含有的Block的數目-1;blockDim.x是一個block中x方向的線程數目,blockDim.y是一個block中y方向的線程數目,blockDim.x*blockDim.y就是一個Block中所含有的線程數, blockIdx.x * blockDim.x * blockDim.y 就是一個grid中所有滿線程的Block中所含有的線程總數。接下來我們看最後一個Block的情況,因爲Block是二維的,所以threadIdx.y * blockDim.x就是滿x的線程數,threadIdx.x是最後一行的線程數。三者相加就是所有線程數。

grid 1D,Block 3D
int idx = blockIdx.x * blockDim.x * blockDim.y * blockDim.z + threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
dim3 dimBlock(x,y,z)
Kernel<<< numBlock,dimBlock>>>(argv)

grid 2D,Block 1D (grid劃分成2維,block劃分成1維)
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * blockDim.x + threadIdx.x;
dim3 dimGrid(x,y);
Kernel<<< dimGrid,threadsPerBlock>>>(argv);

grid 2D,Block 2D
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * (blockDim.x * blockDim.y) + (threadIdx.y * blockDim.x) + threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 2D,Block 3D
int blockId = blockIdx.y * gridDim.x + blockIdx.x;
int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x)+ threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 3D,Block 1D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x+ gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * blockDim.x + threadIdx.x;
dim3 dimGrid(x,y,z);
Kernel<<< dimGrid,threadsPerBlock>>>(argv);

grid 3D,Block 2D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * (blockDim.x * blockDim.y)+ (threadIdx.y * blockDim.x) + threadIdx.x;
dim3 dimGrid(x1,y1,z1),dimBlock(x2,y2);
Kernel<<< dimGrid,dimBlock>>>(argv);

grid 3D,block 3D
int blockId = blockIdx.x+ blockIdx.y * gridDim.x+ gridDim.x * gridDim.y * blockIdx.z;
int Idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x)+ threadIdx.x;
dim3 dimGrid(x1,y1),dimBlock(x2,y2,z2);
Kernel<<< dimGrid,dimBlock>>>(argv);

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章