以cufftPlanMany爲例FFT變換中embed,stride,dist的解釋與設置

關於FFT的自定義數據分佈進行變換,之前每次都是用的寫demo,這次搞明白之後記錄一下,以便以後查閱。

比如需要對一個二維數組裏的每一行或者每一列進行傅里葉變換,那麼需要對cufftPlanMany進行設置,然後進行批量處理。

cufftPlanMany的函數聲明如下

cufftResult cufftPlanMany(cufftHandle *plan, int rank, int *n, int *inembed,
int istride, int idist, int *onembed, int ostride,int odist, cufftType type, int batch);

比如對於一個10*5的二維數組,想對每一行進行FFT:

NX=10;NY=5;

rank=1,表明是進行一維傅里葉變換;

n[1],數組n維度與rank相同,表明每一維的變換數目,本例中n只有一維,n[0]=NX,表示一維傅里葉變換需要10個元素;

inembed[2],表明原始輸入數據的維度,本例中輸入數據是一個二維數組,所以inembed維度爲2,inembed[0]=NX, inembed[1]=NY;

istride,表明在一個傅里葉變換內部每一個元素相距的距離,本例中 istride=1;

idist,表明在兩個傅里葉變換之間,每一個元素相距的距離,本例中 isist=NX;

batch 表明多少個獨立的傅里葉變換,本例中 batch=NY

本例中我們進行C2C的傅里葉變換,所以輸入和輸出格式一樣。實現代碼如下:

#include <stdlib.h>
#include <stdio.h>

#include <string.h>
#include <math.h>
#include "timer.h"

#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#define Ndim 2
#define NX 10
#define NY 5


void testplanmany() {

	int N[2];
	N[0] = NX, N[1] = NY;
	int NXY = N[0] * N[1];
	cufftComplex *input = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
	cufftComplex *output = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
	int i;
	for (i = 0; i < NXY; i++) {
		input[i].x = i % 1000;
		input[i].y = 0;
	}
	cufftComplex *d_inputData, *d_outData;
	cudaMalloc((void**) &d_inputData, N[0] * N[1] * sizeof(cufftComplex));
	cudaMalloc((void**) &d_outData, N[0] * N[1] * sizeof(cufftComplex));
	cudaMemcpy(d_inputData, input, N[0] * N[1] * sizeof(cufftComplex), cudaMemcpyHostToDevice);
	cufftHandle plan;
	/*
	cufftMakePlanMany(cufftHandle plan, int rank, int *n, int *inembed,
	int istride, int idist, int *onembed, int ostride,
	int odist, cufftType type, int batch, size_t *workSize);
	 */
	int rank=1;
	int n[1];
	n[0]=NX;
	int istride=1;
	int idist = NX;
	int ostride=1;
	int odist = NX;
	int inembed[2];
	int onembed[2];
	inembed[0]=NX;  onembed[0]=NX;
	inembed[1] = NY; onembed[0] = NY;

	cufftPlanMany(&plan,rank,n,inembed, istride ,idist , onembed, ostride,odist, CUFFT_C2C, NY);
	cufftExecC2C(plan, d_inputData, d_outData, CUFFT_FORWARD);
	cudaMemcpy(output, d_outData, NXY * sizeof(cufftComplex), cudaMemcpyDeviceToHost);

	for (i = 0; i < NXY; i++) {
		if(i%NX==0)
			printf("\n");
		printf("%f %f \n", output[i].x, output[i].y);
	}

	cufftDestroy(plan);
	free(input);
	free(output);
	cudaFree(d_inputData);
	cudaFree(d_outData);
}

int main() {

	testplanmany();
}





實際輸出如下:

45.000000 0.000000 
-5.000000 15.388418 
-5.000000 6.881910 
-5.000000 3.632713 
-5.000000 1.624598 
-5.000000 0.000000 
-5.000000 -1.624598 
-5.000000 -3.632713 
-5.000000 -6.881910 
-5.000000 -15.388418 

145.000000 0.000000 
-5.000000 15.388418 
-5.000000 6.881910 
-5.000000 3.632713 
-5.000000 1.624598 
-5.000000 0.000000 
-5.000000 -1.624598 
-5.000000 -3.632713 
-5.000000 -6.881910 
-5.000000 -15.388418 

245.000000 0.000000 
-4.999997 15.388416 
-5.000000 6.881910 
-5.000000 3.632713 
-5.000000 1.624597 
-5.000000 0.000000 
-5.000000 -1.624597 
-5.000000 -3.632713 
-5.000000 -6.881910 
-4.999997 -15.388416 

345.000000 0.000000 
-4.999998 15.388418 
-4.999999 6.881909 
-4.999999 3.632712 
-5.000000 1.624598 
-5.000000 0.000000 
-5.000000 -1.624598 
-4.999999 -3.632712 
-4.999999 -6.881909 
-4.999998 -15.388418 

445.000000 0.000000 
-4.999999 15.388418 
-5.000001 6.881911 
-5.000000 3.632714 
-5.000000 1.624598 
-5.000000 0.000000 
-5.000000 -1.624598 
-5.000000 -3.632714 
-5.000001 -6.881911 
-4.999999 -15.388418 

 

對於上述數據每一列進行一次FFT來說:

NX=10;NY=5;

rank=1,表明是進行一維傅里葉變換;

n[1],數組n維度與rank相同,表明每一維的變換數目,本例中n只有一維,n[0]=NY,表示一維傅里葉變換需要5個元素;

inembed[2],表明原始輸入數據的維度,本例中輸入數據是一個二維數組,所以inembed維度爲2,inembed[0]=NX, inembed[1]=NY;

istride,表明在一個傅里葉變換內部每一個元素相距的距離,本例中 istride= NX;

idist,表明在兩個傅里葉變換之間,每一個元素相距的距離,本例中 isist=1; (因爲兩列傅里葉變換第一個元素相鄰)

batch 表明多少個獨立的傅里葉變換,本例中 batch=NX

代碼如下:

#include <stdlib.h>
#include <stdio.h>

#include <string.h>
#include <math.h>
#include "timer.h"

#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#define Ndim 2
#define NX 10
#define NY 5


void testplanmany() {

	int N[2];
	N[0] = NX, N[1] = NY;
	int NXY = N[0] * N[1];
	cufftComplex *input = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
	cufftComplex *output = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
	int i;
	for (i = 0; i < NXY; i++) {
		input[i].x = i % 1000;
		input[i].y = 0;
	}
	cufftComplex *d_inputData, *d_outData;
	cudaMalloc((void**) &d_inputData, N[0] * N[1] * sizeof(cufftComplex));
	cudaMalloc((void**) &d_outData, N[0] * N[1] * sizeof(cufftComplex));
	cudaMemcpy(d_inputData, input, N[0] * N[1] * sizeof(cufftComplex), cudaMemcpyHostToDevice);
	cufftHandle plan;
	/*
	cufftMakePlanMany(cufftHandle plan, int rank, int *n, int *inembed,
	int istride, int idist, int *onembed, int ostride,
	int odist, cufftType type, int batch, size_t *workSize);
	 */
	int rank=1;
	int n[1];
	n[0]=NY;
	int istride=NX;
	int idist = 1;
	int ostride=NX;
	int odist = 1;
	int inembed[2];
	int onembed[2];
	inembed[0]=NX;  onembed[0]=NX;
	inembed[1] = NY; onembed[0] = NY;

	cufftPlanMany(&plan,rank,n,inembed, istride ,idist , onembed, ostride,odist, CUFFT_C2C, NX);
	cufftExecC2C(plan, d_inputData, d_outData, CUFFT_FORWARD);
	cudaMemcpy(output, d_outData, NXY * sizeof(cufftComplex), cudaMemcpyDeviceToHost);

	for (i = 0; i < NXY; i++) {
		if(i%NX==0)
			printf("\n");
		printf("%f %f \n", output[i].x, output[i].y);
	}

	cufftDestroy(plan);
	free(input);
	free(output);
	cudaFree(d_inputData);
	cudaFree(d_outData);
}

int main() {

	testplanmany();
}

結果如下:


100.000000 0.000000 
105.000000 0.000000 
110.000000 0.000000 
115.000000 0.000000 
120.000000 0.000000 
125.000000 0.000000 
130.000000 0.000000 
135.000000 0.000000 
140.000000 0.000000 
145.000000 0.000000 

-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 
-25.000000 34.409550 

-25.000002 8.122991 
-25.000002 8.122991 
-24.999998 8.122991 
-24.999998 8.122991 
-24.999998 8.122991 
-25.000000 8.122991 
-25.000000 8.122991 
-25.000000 8.122991 
-25.000000 8.122991 
-25.000000 8.122991 

-25.000002 -8.122991 
-25.000002 -8.122991 
-24.999998 -8.122991 
-24.999998 -8.122991 
-24.999998 -8.122991 
-25.000000 -8.122991 
-25.000000 -8.122991 
-25.000000 -8.122991 
-25.000000 -8.122991 
-25.000000 -8.122991 

-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 
-25.000000 -34.409550 

另外文章參考了這個資源:

https://rocfft.readthedocs.io/en/latest/real.html#setting-strides

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章