關於FFT的自定義數據分佈進行變換,之前每次都是用的寫demo,這次搞明白之後記錄一下,以便以後查閱。
比如需要對一個二維數組裏的每一行或者每一列進行傅里葉變換,那麼需要對cufftPlanMany進行設置,然後進行批量處理。
cufftPlanMany的函數聲明如下
cufftResult cufftPlanMany(cufftHandle *plan, int rank, int *n, int *inembed,
int istride, int idist, int *onembed, int ostride,int odist, cufftType type, int batch);
比如對於一個10*5的二維數組,想對每一行進行FFT:
NX=10;NY=5;
rank=1,表明是進行一維傅里葉變換;
n[1],數組n維度與rank相同,表明每一維的變換數目,本例中n只有一維,n[0]=NX,表示一維傅里葉變換需要10個元素;
inembed[2],表明原始輸入數據的維度,本例中輸入數據是一個二維數組,所以inembed維度爲2,inembed[0]=NX, inembed[1]=NY;
istride,表明在一個傅里葉變換內部每一個元素相距的距離,本例中 istride=1;
idist,表明在兩個傅里葉變換之間,每一個元素相距的距離,本例中 isist=NX;
batch 表明多少個獨立的傅里葉變換,本例中 batch=NY
本例中我們進行C2C的傅里葉變換,所以輸入和輸出格式一樣。實現代碼如下:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include "timer.h"
#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#define Ndim 2
#define NX 10
#define NY 5
void testplanmany() {
int N[2];
N[0] = NX, N[1] = NY;
int NXY = N[0] * N[1];
cufftComplex *input = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
cufftComplex *output = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
int i;
for (i = 0; i < NXY; i++) {
input[i].x = i % 1000;
input[i].y = 0;
}
cufftComplex *d_inputData, *d_outData;
cudaMalloc((void**) &d_inputData, N[0] * N[1] * sizeof(cufftComplex));
cudaMalloc((void**) &d_outData, N[0] * N[1] * sizeof(cufftComplex));
cudaMemcpy(d_inputData, input, N[0] * N[1] * sizeof(cufftComplex), cudaMemcpyHostToDevice);
cufftHandle plan;
/*
cufftMakePlanMany(cufftHandle plan, int rank, int *n, int *inembed,
int istride, int idist, int *onembed, int ostride,
int odist, cufftType type, int batch, size_t *workSize);
*/
int rank=1;
int n[1];
n[0]=NX;
int istride=1;
int idist = NX;
int ostride=1;
int odist = NX;
int inembed[2];
int onembed[2];
inembed[0]=NX; onembed[0]=NX;
inembed[1] = NY; onembed[0] = NY;
cufftPlanMany(&plan,rank,n,inembed, istride ,idist , onembed, ostride,odist, CUFFT_C2C, NY);
cufftExecC2C(plan, d_inputData, d_outData, CUFFT_FORWARD);
cudaMemcpy(output, d_outData, NXY * sizeof(cufftComplex), cudaMemcpyDeviceToHost);
for (i = 0; i < NXY; i++) {
if(i%NX==0)
printf("\n");
printf("%f %f \n", output[i].x, output[i].y);
}
cufftDestroy(plan);
free(input);
free(output);
cudaFree(d_inputData);
cudaFree(d_outData);
}
int main() {
testplanmany();
}
實際輸出如下:
45.000000 0.000000
-5.000000 15.388418
-5.000000 6.881910
-5.000000 3.632713
-5.000000 1.624598
-5.000000 0.000000
-5.000000 -1.624598
-5.000000 -3.632713
-5.000000 -6.881910
-5.000000 -15.388418
145.000000 0.000000
-5.000000 15.388418
-5.000000 6.881910
-5.000000 3.632713
-5.000000 1.624598
-5.000000 0.000000
-5.000000 -1.624598
-5.000000 -3.632713
-5.000000 -6.881910
-5.000000 -15.388418
245.000000 0.000000
-4.999997 15.388416
-5.000000 6.881910
-5.000000 3.632713
-5.000000 1.624597
-5.000000 0.000000
-5.000000 -1.624597
-5.000000 -3.632713
-5.000000 -6.881910
-4.999997 -15.388416
345.000000 0.000000
-4.999998 15.388418
-4.999999 6.881909
-4.999999 3.632712
-5.000000 1.624598
-5.000000 0.000000
-5.000000 -1.624598
-4.999999 -3.632712
-4.999999 -6.881909
-4.999998 -15.388418
445.000000 0.000000
-4.999999 15.388418
-5.000001 6.881911
-5.000000 3.632714
-5.000000 1.624598
-5.000000 0.000000
-5.000000 -1.624598
-5.000000 -3.632714
-5.000001 -6.881911
-4.999999 -15.388418
對於上述數據每一列進行一次FFT來說:
NX=10;NY=5;
rank=1,表明是進行一維傅里葉變換;
n[1],數組n維度與rank相同,表明每一維的變換數目,本例中n只有一維,n[0]=NY,表示一維傅里葉變換需要5個元素;
inembed[2],表明原始輸入數據的維度,本例中輸入數據是一個二維數組,所以inembed維度爲2,inembed[0]=NX, inembed[1]=NY;
istride,表明在一個傅里葉變換內部每一個元素相距的距離,本例中 istride= NX;
idist,表明在兩個傅里葉變換之間,每一個元素相距的距離,本例中 isist=1; (因爲兩列傅里葉變換第一個元素相鄰)
batch 表明多少個獨立的傅里葉變換,本例中 batch=NX
代碼如下:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include "timer.h"
#include <cuda_runtime.h>
#include <cufft.h>
#include "device_launch_parameters.h"
#define Ndim 2
#define NX 10
#define NY 5
void testplanmany() {
int N[2];
N[0] = NX, N[1] = NY;
int NXY = N[0] * N[1];
cufftComplex *input = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
cufftComplex *output = (cufftComplex*) malloc(NXY * sizeof(cufftComplex));
int i;
for (i = 0; i < NXY; i++) {
input[i].x = i % 1000;
input[i].y = 0;
}
cufftComplex *d_inputData, *d_outData;
cudaMalloc((void**) &d_inputData, N[0] * N[1] * sizeof(cufftComplex));
cudaMalloc((void**) &d_outData, N[0] * N[1] * sizeof(cufftComplex));
cudaMemcpy(d_inputData, input, N[0] * N[1] * sizeof(cufftComplex), cudaMemcpyHostToDevice);
cufftHandle plan;
/*
cufftMakePlanMany(cufftHandle plan, int rank, int *n, int *inembed,
int istride, int idist, int *onembed, int ostride,
int odist, cufftType type, int batch, size_t *workSize);
*/
int rank=1;
int n[1];
n[0]=NY;
int istride=NX;
int idist = 1;
int ostride=NX;
int odist = 1;
int inembed[2];
int onembed[2];
inembed[0]=NX; onembed[0]=NX;
inembed[1] = NY; onembed[0] = NY;
cufftPlanMany(&plan,rank,n,inembed, istride ,idist , onembed, ostride,odist, CUFFT_C2C, NX);
cufftExecC2C(plan, d_inputData, d_outData, CUFFT_FORWARD);
cudaMemcpy(output, d_outData, NXY * sizeof(cufftComplex), cudaMemcpyDeviceToHost);
for (i = 0; i < NXY; i++) {
if(i%NX==0)
printf("\n");
printf("%f %f \n", output[i].x, output[i].y);
}
cufftDestroy(plan);
free(input);
free(output);
cudaFree(d_inputData);
cudaFree(d_outData);
}
int main() {
testplanmany();
}
結果如下:
100.000000 0.000000
105.000000 0.000000
110.000000 0.000000
115.000000 0.000000
120.000000 0.000000
125.000000 0.000000
130.000000 0.000000
135.000000 0.000000
140.000000 0.000000
145.000000 0.000000
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000000 34.409550
-25.000002 8.122991
-25.000002 8.122991
-24.999998 8.122991
-24.999998 8.122991
-24.999998 8.122991
-25.000000 8.122991
-25.000000 8.122991
-25.000000 8.122991
-25.000000 8.122991
-25.000000 8.122991
-25.000002 -8.122991
-25.000002 -8.122991
-24.999998 -8.122991
-24.999998 -8.122991
-24.999998 -8.122991
-25.000000 -8.122991
-25.000000 -8.122991
-25.000000 -8.122991
-25.000000 -8.122991
-25.000000 -8.122991
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
-25.000000 -34.409550
另外文章參考了這個資源:
https://rocfft.readthedocs.io/en/latest/real.html#setting-strides