線程組織

CUDA內置變量

uint3 gridDim;              //單個網格中每一維度上的塊數
uint3 blockIdx;             //塊在網格中的索引

uint3 blockDim;             //單個塊中每一維度上的線程數
uint3 threadIdx;            //線程在塊中的索引

調用kernek函數

dim3 gridDims , blockDims;                  //對應device端的gridDim和blockDim
kernelFunc <<<gridDims,blockDims>>> (args);

從硬件角度考慮線程的連續性，維度從低到高的順序是blockIdx.x，blockIdx.y，blockIdx.z。由於不同塊可能在不同的SM上執行，所以其相鄰性不確定。計算線程絕對索引的方式爲

uint numThreadsPerBlockLine = blockDim.x;
uint numThreadsPerBlockPlane = numThreadsPerBlockLine.x * blockDim.y;
uint numThreadsPerBlock = numThreadsPerBlockPlane * blockDim.z;

uint numThreadsPerGridLine = numThreadsPerBlock * gridDim.x;
uint numThreadsPerGridPlane = numThreadsPerGridLine * gridDim.y;

uint id = numThreadsPerGridPlane * blockIdx.z
        + numThreadsPerGridLine * blockIdx.y
        + numThreadsPerBlock * blockIdx.x
        + numThreadsPerBlockPlane * threadIdx.z
        + numThreadsPerBlockLine * threadIdx.y
        + threadIdx.x;


uint ix = blockIdx.x * blockDim.x + threadIdx.x;
uint iy = blockIdx.y * blockDim.y + threadIdx.y;
uint iz = blockIdx.z * blockDim.z + threadIdx.z;
uint threadsPerLine = gridDim.x * blockDim.x;
uint threadPerColumn = gridDim.y * blockDim.y;
uint threadsPerPlane = threadsPerLine * threadsPerColumn;

uint id = threadsPerPlane * iz
        + threadsPerLine * iy
        + ix;

將二維線程索引和塊索引轉換爲二維內存索引時，通常是爲了處理圖像，所以同一塊中的線程對應的像素數據應該具備空間局部性。此時的解決方法是使用共享內存緩存塊對應的像素內存，或使用紋理內存。不論使用哪一種方法，二維內存索引的計算方式都不能使用計算線程絕對索引的方法，因爲該方法得到的索引在內存上連續（連續內存和空間局部性衝突）。

Android.Camera2 API

Occupancy

細分着色器

共享內存

流

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結