告訴你cuda共享內存的使用

原創

2020-06-19 12:22

想必大家都知道，cuda裏面每一個block上有一塊高速緩衝區，這就是提供給block裏面各個線程使用的shared memory，那怎麼使用這一塊內存呢？

首先，shared memory分爲固定分配方式和動態分配方式，就是上圖的Static Shared Memory和Dynamic Shared Memory
1，固定分配
直接__shared__ int seme[5] ;這就是在每一個block裏面分配5個int（20B）

__global__ void addKernel(int *c, const int *a)
{
	int i = threadIdx.x;
	 __shared__ int smem[5];
	smem[i] = a[i];
	__syncthreads();
	if (i == 0)	//0號線程做平方和
	{
		c[0] = 0;
		for (int d = 0; d<5; d++)
		{
			c[0] += smem[d] * smem[d];
		}
	}
	if (i == 1)//1號線程做累加
	{
		c[1] = 0;
		for (int d = 0; d<5; d++)
		{
			c[1] += smem[d];
		}
	}
	if (i == 2)	//2號線程做累乘
	{
		c[2] = 1;
		for (int d = 0; d<5; d++)
		{
			c[2] *= smem[d];
		}
	}

}

調用，啓動的時候，block個數1，所以shared memory使用20B

addKernel << <1,size, 0, 0 >> >(dev_c, dev_a);

通過nsight可以看出，使用了20B的共享內存，並且是Static的；

2，動態分配
沒錯，就是在block裏面聲明，前面加上extern；

__global__ void addKernel(int *c, const int *a)
{
	int i = threadIdx.x;
	 extern __shared__ int smem[];
	smem[i] = a[i];
	__syncthreads();
	if (i == 0)	//0號線程做平方和
	{
		c[0] = 0;
		for (int d = 0; d<5; d++)
		{
			c[0] += smem[d] * smem[d];
		}
	}
	if (i == 1)//1號線程做累加
	{
		c[1] = 0;
		for (int d = 0; d<5; d++)
		{
			c[1] += smem[d];
		}
	}
	if (i == 2)	//2號線程做累乘
	{
		c[2] = 1;
		for (int d = 0; d<5; d++)
		{
			c[2] *= smem[d];
		}
	}

}

那在哪裏指定大小呢？
原來是啓動核函數的時候指定的第三個參數，之前使用多個流的時候，第四個參數綁定流的序號，第三個參數總是設爲0，現在終於明白它的含義了

addKernel << <1,size, size*sizeof(int), 0 >> >(dev_c, dev_a);//第三個參數是每個block共享內存的大小

這幾天正在準備寫一篇關於cuda流的使用，然後會加上一些自己的學習總結，年輕，幹就完了，奧利幹！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

告訴你cuda共享內存的使用

cuda流的使用，結合nsight查看時間線

這麼神奇嗎 Matlab的並行計算 get！

vscode C/C++環境配置

cuda學習筆記（1）：Nsight Compute的使用

告訴你cuda共享內存的使用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結