基於FPGA的CORDIC算法的實現(1)

參考文獻

[1].liyuanbhu
[2].碎碎思
[3].電子發燒友(這門課裏面的代碼寫的非常棒,建議有條件的同學可以與板卡一起購買,記住一定是帶着板卡,這裏不再多說)

項目簡述

基本上懂點FPGA信號處理操作的同學都聽過CORDIC算法,該算法可以被使用計算常見函數及超越函數。那麼喜歡刨根問底的同學就會問爲什麼CORDIC算法可以被使用來計算常見函數,該算法又可以使用計算哪些函數,精度如何等等問題。那麼這篇文章及接下來的文章將用來介紹這些問題,其實關於該算法在CSDN上面已經又比較完善的CSDN博主進行了介紹,包括我也是使用上面的博客進行的學習,博主的連接以及一些參考文獻會在文章的最後給出。

首先CORDIC的全稱是 Coordinate Rotation Digital Computer 也就是我們常說的座標旋轉算法。既然是座標旋轉算法,那麼就需要座標進而座標系是必須需要提前確定。常見在CORDIC算法中使用的系統有圓周系統、線性系統、雙曲系統,每種系統又分爲向量模式與旋轉模式,每種模式可以使用計算不同的函數。包括如果掌握了CORDIC的原理計算一些其他特殊函數也是可能的。 CORDIC函數可以使用計算的函數如下:
在這裏插入圖片描述
下面是VIVADO中CORDIC IP中可以計算的函數
在這裏插入圖片描述
這裏是不是可以看出上面的函數基本上是一一對應的。

CORDIC算法向量模式原理

CIRDIC算法向量模式推導步驟一

這裏主要參考的是參考文獻[1]中的文章,大家可以進行相應的閱讀。
平面上一點在直角座標系下的座標(X,Y)=(100,200),如何求的在極座標系下的座標(ρ,θ)。用計算器計算一下可知答案是(223.61,63.435)。
在這裏插入圖片描述
爲了突出重點,這裏我們只討論X和Y都爲正數的情況。這裏或許有同學要說如果X和Y中有爲負值的情況應該咋麼辦,其實這部分的算法不需要X和Y都爲正值,但是需要X爲正值。如果X爲負值,那麼我們便需要進行相應的處理,方法就是將X軸的值變成正值,但是這部分不要忘記CORDIC迭代的初始值發生變化。當X變成正值之後θ=atan(y/x)。求θ的過程也就是求atan 函數的過程。Cordic算法採用的想法很直接,將(x,y)(x,y)旋轉一定的度數,如果旋轉完縱座標變爲了0,那麼旋轉的度數就是θ。座標旋轉的公式可能大家都忘了,這裏把公式列出了。設(x,y)(x,y)是原始座標點,將其以原點爲中心,順時針旋轉θ之後的座標記爲(x1,y1)(x_1,y_1),則有如下公式:
在這裏插入圖片描述
這裏要明確我們的目標是爲了將yy變成零,爲了減少計算量,都是先用二分法進行旋轉,也就是說第一次旋轉45度,至於是順時針旋轉還是逆時針旋轉取決於yy的符號。
在這裏插入圖片描述
旋轉之後縱座標爲70.71,還是大於0,說明旋轉的度數不夠,接着再旋轉22.5度。
在這裏插入圖片描述
這時總共旋轉了45+22.5=67.5度。結果縱座標變爲了負數,說明θ<67.5度,這時就要往回轉,還是二分查找法的思想,這次轉11.25度。
在這裏插入圖片描述
這時總共旋轉了45+22.5-11.25=56.25度。又轉過頭了,接着旋轉,這次順時針轉5.625度。
在這裏插入圖片描述
這時總共旋轉了45+22.5-11.25+5.625=61.875度。這時縱座標已經很接近0了。我們只是說明算法的思想,因此就不接着往下計算了。計算到這裏我們給的答案是 61.875±5.625。二分查找法本質上查找的是一個區間,因此我們給出的是θ值的一個範圍。同時,座標到原點的距離ρ也求出來了,ρ=223.52。與標準答案比較一下計算的結果還是可以的。旋轉的過程圖示如下。
在這裏插入圖片描述
可能有讀者會問,計算中用到了 sin 函數和 cos 函數,這些值又是怎麼計算呢。很簡單,我們只用到很少的幾個特殊點的sin 函數和 cos 函數的值,提前計算好存起來,用查找表。這裏需要注意,這種思想在FPGA中非常容易遇見。

將上面的思想我們使用MATLAB來實現如下:

clc;
clear all;

sine = [0.7071067811865,0.3826834323651,0.1950903220161,0.09801714032956,0.04906767432742,0.02454122852291,0.01227153828572,0.006135884649154,0.003067956762966,0.001533980186285,7.669903187427045e-4,3.834951875713956e-4,1.917475973107033e-4,9.587379909597735e-5,4.793689960306688e-5,2.396844980841822e-5];
cosine = [0.7071067811865,0.9238795325113,0.9807852804032,0.9951847266722, ...
0.9987954562052,0.9996988186962,0.9999247018391,0.9999811752826,0.9999952938096, ...
0.9999988234517,0.9999997058629,0.9999999264657,0.9999999816164,0.9999999954041, ...
0.999999998851,0.9999999997128];
angle = 45;
a = zeros(16,1);
for i = 1:16
    a(i) = angle;
    angle = angle/2;
end

 x = 100;
 y = -300;
 z = 0;
 
for i = 1:16
    if(y > 0)
        x_new = x*cosine(i) + y*sine(i);
        y_new = y*cosine(i) - x*sine(i);
        x = x_new;
        y = y_new;
        z = z + a(i);
    else
        x_new = x*cosine(i) - y*sine(i);
        y_new = y*cosine(i) + x*sine(i);
        x = x_new;
        y = y_new;
        z = z - a(i);
    end
end
z


結果如下:
在這裏插入圖片描述

CIRDIC算法向量模式推導步驟二

CORDIC一般是在FPGA中實現。FPGA中的DSP資源是非常寶貴的資源,所以我們要儘可能減少CORDIC中的乘法的個數,所以將公式變形如下:
在這裏插入圖片描述
這裏因爲我們要計算相位arctan(y/x)arctan(y/x),所以我們先將縮放因子去掉
在這裏插入圖片描述
但是我們注意到 CIRDIC算法向量模式不僅可以計算arctan(y/x)arctan(y/x)而且可以計算x2+y2\sqrt{x^2+y^2},所以這個補償因子到最後肯定會補償回來,在FPGA中同樣利用查表得方法補償回來。
省略cos(θ)後發生了什麼呢,每次旋轉後的新座標點到原點的距離都變長了,放縮的係數是1/cos(θ)。不過沒有關係,我們求的是θ,不關心ρ的改變。這樣的變形非常的簡單,但是每次循環的運算量一下就從4次乘法降到了2次乘法了。

將上面的思想我們使用MATLAB來實現如下:

clc;
clear all;

tangent = [1.0,0.4142135623731,0.1989123673797,0.09849140335716,0.04912684976947, ...
0.02454862210893,0.01227246237957,0.006136000157623,0.003067971201423, ... 
0.001533981991089,7.669905443430926e-4,3.83495215771441e-4,1.917476008357089e-4, ... 
9.587379953660303e-5,4.79368996581451e-5,2.3968449815303e-5];
angle = 45;
a = zeros(16,1);
for i = 1:16
    a(i) = angle;
    angle = angle/2;
end

 x = 100;
 y = -300;
 z = 0;
 
for i = 1:16
    if(y > 0)
        x_new = x+ y*tangent(i);
        y_new = y - x*tangent(i);
        x = x_new;
        y = y_new;
        z = z + a(i);
    else
        x_new = x- y*tangent(i);
        y_new = y + x*tangent(i);
        x = x_new;
        y = y_new;
        z = z - a(i);
    end
end
z

在這裏插入圖片描述
結果與公式變形前得結果一摸一樣,進而說明了我們實驗得正確性。

CIRDIC算法向量模式推導步驟三

在FPGA中多得是寄存器查找表等資源,DSP資源非常少,所以我們要儘可能得消除CORDIC中得乘法,消除得方法是變下面公式中得乘法爲移位操作:
在這裏插入圖片描述
所以我們要求tan(θ)tan(θ)是2得負整數次冪。然後我們對上面得式子進行分析:
第一次循環時,tan(45)=1,所以第一次循環實際上是不需要乘法運算的。第二次運算呢?

Tan(22.5)=0.4142135623731,很不幸,第二次循環乘數是個很不整的小數。是否能對其改造一下呢?答案是肯定的。第二次選擇22.5度是因爲二分查找法的查找效率最高。如果選用個在22.5到45度之間的值,查找的效率會降低一些。如果稍微降低一點查找的效率能讓我們有效的減少乘法的次數,使最終的計算速度提高了,那麼這種改進就是值得的。

我們發現tan(26.565051177078)=0.5,如果我們第二次旋轉採用26.565051177078度,那麼乘數變爲0.5,如果我們採用定點數運算的話(沒有浮點協處理器時爲了加速計算我們會大量的採用定點數算法)乘以0.5就相當於將乘數右移一位。右移運算是很快的,這樣第二次循環中的乘法運算也被消除了。

類似的方法,第三次循環中不用11.25度,而採用 14.0362434679265 度。

Tan(14.0362434679265)= 1/4

乘數右移兩位就可以了。剩下的都以此類推。

在這裏插入圖片描述
所以我們給出相應的MATLAB代碼:

clc;
clear all;

angle = [45.0, 26.565051177078, 14.0362434679265, 7.1250163489018, 3.57633437499735, ...
                            1.78991060824607, 0.8951737102111, 0.4476141708606, 0.2238105003685, 0.1119056770662, ... 
                            0.0559528918938, 0.027976452617, 0.01398822714227, 0.006994113675353, 0.003497056850704,0.001748528426980];
                     
tangent = [1.0, 1 / 2.0, 1 / 4.0, 1 / 8.0, 1 / 16.0, ...
                              1 / 32.0, 1 / 64.0, 1 / 128.0, 1 / 256.0, 1 / 512.0, ...
                              1 / 1024.0, 1 / 2048.0, 1 / 4096.0, 1 / 8192.0, 1 / 16384.0,1/32768];


 x = 100;
 y = -300;
 z = 0;
 
for i = 1:16
    if(y > 0)
        x_new = x+ y*tangent(i);
        y_new = y - x*tangent(i);
        x = x_new;
        y = y_new;
        z = z + angle(i);
    else
        x_new = x- y*tangent(i);
        y_new = y + x*tangent(i);
        x = x_new;
        y = y_new;
        z = z - angle(i);
    end
end
z

上面的程序由於MATLAB本身不利於移位操作,所以我們也就乘以了相應的數,但這點在FPGA中是相當容易操作的。
運行結果如下:
在這裏插入圖片描述
到這裏 CORDIC 算法的最核心的思想就介紹完了。當然,這裏介紹的只是CORDIC算法最基本的內容,實際上,利用CORDIC 算法不光可以計算 atan 函數,其他的像 Sin,Cos,Sinh,Cosh 等一系列的函數都可以計算。

CIRDIC算法向量模式推導步驟三

上面爲計算過程中我們將cos(θ)cos(θ)省略,所以爲了計算x2+y2\sqrt{x^2+y^2},所以這個補償因子到最後肯定會補償回來。因爲每次推導我們都省略了cos(θ)cos(θ),所以我們最終的真實值(xn1,yn1)(x_{n1},y_{n1})需要進行的縮放處理如下:
在這裏插入圖片描述
由前面可知:
在這裏插入圖片描述
所以:
在這裏插入圖片描述
若總的旋轉次數爲n, 則總的模長補償因子K可表示爲:
在這裏插入圖片描述
當n趨於無窮大時,K 逼近 0.607252935。
對應的MATLAB程序如下:

clc;
clear all;

angle = [45.0, 26.565051177078, 14.0362434679265, 7.1250163489018, 3.57633437499735, ...
                            1.78991060824607, 0.8951737102111, 0.4476141708606, 0.2238105003685, 0.1119056770662, ... 
                            0.0559528918938, 0.027976452617, 0.01398822714227, 0.006994113675353, 0.003497056850704,0.001748528426980];
                     
tangent = [1.0, 1 / 2.0, 1 / 4.0, 1 / 8.0, 1 / 16.0, ...
                              1 / 32.0, 1 / 64.0, 1 / 128.0, 1 / 256.0, 1 / 512.0, ...
                              1 / 1024.0, 1 / 2048.0, 1 / 4096.0, 1 / 8192.0, 1 / 16384.0,1/32768];


 x = 100;
 y = -300;
 z = 0;
 
for i = 1:16
    if(y > 0)
        x_new = x+ y*tangent(i);
        y_new = y - x*tangent(i);
        x = x_new;
        y = y_new;
        z = z + angle(i);
    else
        x_new = x- y*tangent(i);
        y_new = y + x*tangent(i);
        x = x_new;
        y = y_new;
        z = z - angle(i);
    end
end
K = 1;
for i = 1:16
    K = K*1/sqrt(1+2^-(2*(i-1)));
end
x_new = x_new*K

z

運行結果如下:
在這裏插入圖片描述
從上面可以驗證我們實驗的正確性,並且K值在實際FPGA中也是進行查表而不是上面程序那樣計算。

MATLAB實現

上面的MATLAB代碼知識爲了驗證我們的推導過程專門寫的代碼,這樣寫的代碼沒辦法與FPGA內部生成的代碼一一對應起來,其中最主要的原因也是因爲沒有對數據進行相應的量化操作,也沒有在程序中進行相應的預處理操作。所以接下來給出相應的完整的代碼,這部分代碼參考了電子發燒友,本來想自己寫,但是架不住別人寫的代碼太好,相應的鏈接已經在參考文獻中給出,需要的同學可以自己學習。

clc;
clear all;
Ninter = 12;%迭代次數
N = 32;
%y: y座標值(Q(N,N-2))
%x: x座標值(Q(N,N-2))
%angle:Q(18,15)
%這些量化指標都是爲了與FPGA中的一致才進行這樣精度的量化
ang = quantizer('mode','fixed','roundmode','nearest','overflowmod','saturate','format',[18,15]);
input = quantizer('mode','fixed','roundmode','floor','overflowmod','saturate','format',[N,N-2]);
amp = quantizer('mode','fixed','roundmode','floor','overflowmod','saturate','format',[N,N-2]);
ampcoe = quantizer('mode','fixed','roundmode','nearest','overflowmod','saturate','format',[18,16]);
amp2 = quantizer('mode','fixed','roundmode','floor','overflowmod','saturate','format',[48,45]);
amp3 = quantizer('mode','fixed','roundmode','floor','overflowmod','saturate','format',[25,22]);


times = 100;

num=0;

fid1 = fopen('x_random_fpga.txt','r');
x_fix = fscanf(fid1,'%d');  
x_fix = x_fix/2^(N-2);

fid1 = fopen('y_random_fpga.txt','r');
y_fix = fscanf(fid1,'%d');  
y_fix = y_fix/2^(N-2);
  
PreciseAng_data = zeros(1,times);
PreciseAmp_data = zeros(1,times);
Ang_data=zeros(1,times);
Amp_data=zeros(1,times);

for t=1:times
    
num=num+1;
    
x = x_fix(t);
y= y_fix(t);

K = zeros(1,Ninter+1);
K(1) = 1;
for i=2:Ninter + 1
    K(i) = K(i-1)*sqrt(1+2^(-2*i+4));
   
end
K = quantize(ampcoe,1./K);
y1 =y;
z = 0;
x1 = abs(x);
x1 = x1;
y1 = y1;
d = -sign(y1);

atan_z = zeros(1,Ninter);
atan_z_dectobin =zeros(Ninter,15);
for i=0:Ninter-1
    atan_z(i+1) = quantize(ang,atan(2^(-i)));
end
   
for n=0:Ninter-1
    if(y1 == 0)
        break;
    end
  x1_q = quantize(amp2,(2^(-n)*x1));
  y1_q = quantize(amp2,(2^(-n)*y1));%(48,45)

  
  x1 =  quantize(amp2,x1 - d*y1_q);
  y1 = quantize(amp2,y1 + d*x1_q);


  
  atan_z_qu = quantize(ang,atan_z(n+1));
  z = quantize(ang,z - d*atan_z_qu);
  
  atan_zzz = atan_z_qu*2^15;
  
  z_comp =z*2^15;
  d = -sign(y1);
end
%座標點預處理
pi_quan = quantize(ang,pi);
if(x  < 0)
  if(y < 0)
     Ang = -z - pi_quan  ;
  else
     Ang = - z + pi_quan ;
  end
else
    Ang = z;
end

Ang_q =Ang*2^15;

x1 = quantize(amp3,x1);%(25,21)這些是與FPGA中的量化代碼相互對應的部分

Amp = quantize(amp,x1*K(n+1));%K(18,16)
Amp_q = Amp*2^(N-2);
 
err = Ang - angle(x + j*y);
PreciseAng = log2(abs(err));
err = Amp - abs(x+j*y);
PreciseAmp = log2(abs(err));

Ang_data(t)=Ang_q;
Amp_data(t)=Amp_q;
PreciseAng_data(t)=PreciseAng;
PreciseAmp_data(t)=PreciseAmp;

if(PreciseAng_data(t)==0)
    break;
end 

if(PreciseAmp_data(t)==0)
    break;
end

end
    
PreciseAng_s_max = max(PreciseAng_data)
PreciseAmp_s_max = max(PreciseAmp_data)
fid_ang = fopen('Ang_matlab.txt','w');
fprintf(fid_ang,'%d\n',Ang_data);
   
fid_amp = fopen('Amp_matlab.txt','w');
fprintf(fid_amp,'%d\n',Amp_data);

然後對比一下100個數據之後算法計算的最大誤差:
在這裏插入圖片描述
上面的結果是將數據轉換成dB的格式,所以說上面的算法處理的是正確的。
上面的難點在意量化操作再FPGA中實現的方式:
在這裏插入圖片描述
在下面FPGA實現的時候我們會進行相應的介紹。

FPGA實現

其實上面代碼的FPGA實現是非常容易的,FPGA的程序是在電子發燒友的基礎上改的,爲了尊重原作者,大家可以購買相應的課程,課程裏面的代碼都非常棒,我也只是改了一小部分。接下來的代碼其實如果想簡單點就可以不使用DSP原語而是直接使用組合邏輯或者IP完成相應的操作。其實關於CORDIC算法的Verilog實現博主19年的時候寫過,還是比較容易的,但是卻沒辦法與MATLAB相互驗證,也沒辦法控制DSP資源的複用,通過該課程的學習我真正掌握了MATLAB與FPGA的相互驗證方法。這一部分由於我只是做了稍微一點改動,所以原作者的信息在博客中保留。

`timescale 1ns / 1ps

////////////////////////////////////////////////////////////////////////////////
// Company: MYMINIEYE
// Engineer:Mill
//
// Create Date:   2016/12/29 14:26:00
// Design Name:   CORDICang_stream
// Module Name:   CORDICang_vector_ip
// Project Name:  FS_cofdm_rx_v00
// Target Device:  zc7045
// Tool versions:  vivado 2015.1
// Description: Cordic 
//	
//
//
// Dependencies:
// 
// Revision:v02
// Revision 0.01 - File Created
// Additional Comments: contact us: [email protected]
// 
////////////////////////////////////////////////////////////////////////////////
module CORDICang_vector_ip #
(
	parameter 		Ninter 		= 		13,
	parameter 		N 			=		32
)
(
	input 				sclk			,
	input 				rst_n			,
	input 		[N-1:0] x				,
	input 		[N-1:0] y				,
	input 				valid			,
	
	output	reg	[17:0]	Ang			    ,
	output	reg	[N-1:0] Amp				,
	output	reg 		Ang_en
	
);
 
//========================================================================================\
//************** 	Main  	Code		**********************************
//========================================================================================/
/*===================================================================
====================================================================*/
reg 					valid_d			;
reg 					valid_a			;
reg 			[N-1:0]	x_a				;
reg 			[N-1:0]	y_a				;
reg 			[ 4:0]	cordic_cnt		;

always @(posedge sclk)
	valid_d 		<= 		valid;

always @(posedge sclk)
	if(!rst_n)
		valid_a 	<=   	1'b0;
	else if(valid && valid_d==1'b0)
		valid_a 	<=   	1'b1;
	else if(cordic_cnt == 5'd29)
		valid_a 	<=   	1'b0;

always @(posedge sclk)
	if(!valid_a)
		cordic_cnt 	<=   	3'd0;
	else 	
		cordic_cnt 	<=   	cordic_cnt + 1'b1;		

always @(posedge sclk)
	if(valid&&valid_d==1'b0)begin
		x_a 		<=   	x;
		y_a 		<=   	y;
	end 
/*===================================================================
====================================================================*/
reg 			[ 4:0]	Ninter_cnt		;
reg 			[ 4:0]	Ninter_cnt_copy1;
wire 			[17:0] 	K_quantize		;
wire 			[17:0]	atanz			;
reg 					valid_reg		;
reg  			[47:0]  x1				;
wire 			[47:0]  x_reg_dsp_x1	;
reg  			[47:0]  y1				;
wire 			[47:0]  y_reg_dsp_y1	;
reg 					x1_add_en		;
reg 					y1_add_en		;
reg  			[17:0] 	z				;
wire 			[17:0] 	z_dsp_lut		;
reg  			[17:0] 	z_dsp_lut_delay1;
wire 			[29:0] 	A_IN_x1			;
reg 			[17:0]	B_IN_x1			;
reg 			[47:0] 	C_x1			;
wire 			[24:0] 	D				;
wire 			[47:0] 	P_x1			;
reg 			[24:0] 	x1_mux			;
reg 			[ 6:0]	OPMODE_x1		;
reg 			[ 3:0]	ALUMODE_x1		;
reg 			[ 4:0]	INMODE_x1		;
reg 			[ 6:0]	OPMODE_y1		;
reg 			[ 3:0]	ALUMODE_y1		;
wire 			[ 4:0]	INMODE_y1		;
reg 			[29:0]	A_IN_y1			;
reg 			[17:0]	B_IN_y1			;
reg 			[47:0] 	C_y1			;
wire 			[47:0] 	P_y1			;
reg 			[47:0]	shift_y_reg		;
reg 			[47:0]	shift_x_reg		;
reg 					break_out 		;
reg 					break_happen 	;
wire 					break_cal		;
reg 					cal_control		;
reg 					first_break_happen	;
always @(posedge sclk)
	if(!rst_n)
		cal_control 		<=   	1'b0;
	else if(valid_a)
		cal_control 		<=   	~cal_control;

always @(*)//這個信號沒有用處
	if(Ninter_cnt_copy1 != 0 && break_cal && cal_control == 1'b0)
		break_happen 		= 		1'b1;
	else 
		break_happen 		= 		1'b0;


always @(posedge sclk)
	if(!rst_n)
		first_break_happen 	<=   	1'b0;
	else if(break_happen)
		first_break_happen 	<=   	1'b1;
	else if(Ninter_cnt_copy1 == 4'd14)
		first_break_happen 	<=   	1'b0;
/*===================================================================
====================================================================*/  
reg 					x_sign,y_sign	;
reg 					Pos_valid		;
reg 			[3:0]	k_addr			;

always @(posedge sclk)
	valid_reg 				<=   	valid_a;


always @(posedge sclk)
	if(!rst_n)
		x_sign 				<= 		1'b0;
	else if(x_a[N-1]&&valid_a&&(~valid_reg))
		x_sign 				<= 		1'b1;
	else if(x_a[N-1]==1'b0&&valid_a&&(~valid_reg))
		x_sign 				<= 		1'b0;

always @(posedge sclk)
	if(!rst_n)
		y_sign 				<= 		1'b0;
	else if(y_a[N-1]&&valid_a&&(~valid_reg))
		y_sign 				<= 		1'b1;
	else if(y_a[N-1]==1'b0&&valid_a&&(~valid_reg))
		y_sign 				<= 		1'b0;


always @(posedge sclk)
	if(!rst_n)
		Pos_valid 			<=   	1'b0;	
	else if(valid_a && (~valid_reg))
		Pos_valid 			<=   	1'b1;
	else if(Ninter_cnt_copy1==5'h1f)
		Pos_valid 			<=   	1'b0;		


always @(posedge sclk)
	if(!rst_n||(!valid_a))
		Ninter_cnt 			<=   	4'd0;
	else if(valid_a&&cal_control)		
		Ninter_cnt 			<=   	Ninter_cnt + 1'b1;


always @(posedge sclk)
	if(!rst_n||(!valid_a))
		Ninter_cnt_copy1 	<=   	4'd0;
	else if(valid_a&&cal_control)		
		Ninter_cnt_copy1 	<=   	Ninter_cnt_copy1 + 1'b1;

always @(posedge sclk)
	if(!rst_n)
		break_out 			<=   	1'b0;
	else if(break_happen==1'b1&&first_break_happen==1'b0)
		break_out 			<=   	1'b1;
	else if(Ninter_cnt_copy1==4'd14)
		break_out 			<=   	1'b0;		


always @(posedge sclk)
	if(!rst_n)
		k_addr 				<=   	4'd0;
	else if(y1[47:48-N]==32'd0)
		k_addr 				<=   	4'd0;
	else if(break_happen&&first_break_happen==1'b0)
		k_addr	 			<=   	Ninter_cnt_copy1;
	else if(!break_out)
		k_addr 				<=   	Ninter;		

K_quantize_dis_rom #(
   .ROM_WIDTH     		(18						),
   .ROM_ADDR_BITS 		(4 						),
   .ROM_DEPTH     		(16						)
) u_K_quantize_dis_rom(
    .clock      		(sclk					),
	.enable     		(1'b1					),
	.address    		(k_addr					),
	.output_data		(K_quantize				)
  );   
atan_z_dis_rom #(
   .ROM_WIDTH     		(18						),
   .ROM_ADDR_BITS 		(4 						),
   .ROM_DEPTH     		(16						)
)u_atan_z_dis_rom(
    .clock      		(sclk					),
	.enable     		(1'b1					),
	.address    		(Ninter_cnt_copy1[3:0] 	),
	.output_data		(atanz 					)
  );  
/*===================================================================
====================================================================*/  
reg 				[N-1:0]	x_abs			;
always @(posedge sclk)
	if(x_a[N-1]==1)
		x_abs 		<=  		(~x_a)+1'b1;
	else 				
		x_abs 		<=  		x_a;

/*===================================================================
====================================================================*/ 
always @(posedge sclk)
	case(Ninter_cnt[3:0])
		4'd0:shift_y_reg <=   {{2{y_a[N-1]}},{y_a[N-2:0]},{(47-N){1'b0}}};//floor
		4'd1: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{1{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:1]};//floor
		4'd2: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{2{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:2]};//floor
		4'd3: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{3{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:3]};//floor
		4'd4: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{4{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:4]};//floor
		4'd5: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{5{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:5]};//floor
		4'd6: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{6{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:6]};//floor
		4'd7: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{7{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:7]};//floor	
		4'd8: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{8{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:8]};//floor
		4'd9: if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{9{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:9]};//floor
		4'd10:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{10{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:10]};//floor
		4'd11:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{11{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:11]};//floor
		4'd12:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{12{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:12]};//floor
		4'd13:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{13{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:13]};//floor
		4'd14:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{14{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:14]};//floor
		4'd15:if(cal_control==1'b0)shift_y_reg <=   {y_reg_dsp_y1[47],{15{y_reg_dsp_y1[47]}},y_reg_dsp_y1[46:15]};//floor	
		default:shift_y_reg <=   shift_y_reg;
	endcase 


always @(posedge sclk)
	case(Ninter_cnt[3:0])
		4'd0:shift_x_reg  <=   {{2{x_abs[N-1]}},{x_abs[N-2:0]},{(47-N){1'b0}}};
		4'd1:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{1{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:1]};//floor
		4'd2:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{2{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:2]};//floor
		4'd3:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{3{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:3]};//floor
		4'd4:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{4{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:4]};//floor
		4'd5:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{5{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:5]};//floor
		4'd6:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{6{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:6]};//floor
		4'd7:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{7{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:7]};//floor	
		4'd8:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{8{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:8]};//floor
		4'd9:if(cal_control==1'b0)shift_x_reg  <=   {x_reg_dsp_x1[47],{9{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:9]};//floor
		4'd10:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{10{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:10]};//floor
		4'd11:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{11{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:11]};//floor
		4'd12:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{12{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:12]};//floor
		4'd13:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{13{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:13]};//floor
		4'd14:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{14{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:14]};//floor
		4'd15:if(cal_control==1'b0)shift_x_reg <=   {x_reg_dsp_x1[47],{15{x_reg_dsp_x1[47]}},x_reg_dsp_x1[46:15]};//floor	
		default:shift_x_reg <=    shift_x_reg;
	endcase
/*=============================================================================
	x1 = x_reg - d*(shift(n+1)*y_reg); and Amp = quantize(q_amp,x1*K(n+1));
	DSP X1: add/sub and mult; MUX:P=(A:B)+/-C; P=B*D
==============================================================================*/
reg 				[47:0] x1_temp			;
reg 				[47:0]	x_reg_dsp_x1_d	;

assign 		A_IN_x1 	= 	shift_y_reg[47:18];
assign 		D 	   		= 	x1_temp[47:23];
assign 		x_reg_dsp_x1= 	P_x1;

always @(*)
	if(Ninter_cnt==4'd0&&Pos_valid)
		x1 				=  		{{2{x_abs[N-1]}},{x_abs[N-2:0]},{(47-N){1'b0}}};
	else 	
		x1 				= 		x_reg_dsp_x1;

always @(posedge sclk)
	if(y1[47:48-N]==32'd0)//y1[47:48-N]==32'd0||break_out
		x1_temp 		<=   	{{2{x_abs[N-1]}},{x_abs[N-2:0]},{(47-N){1'b0}}};
	else if(break_happen == 1'b1)
		x1_temp 		<=   	x_reg_dsp_x1;
	else if(cal_control==1'b0 && break_out==1'b0 && Ninter_cnt==Ninter)//if(break_happen&&first_break_happen==1'b0)
		x1_temp 		<=   	x_reg_dsp_x1;

always @(posedge sclk)
	y1 					<=   {{2{y_a[N-1]}},{y_a[N-2:0]},{(47-N){1'b0}}};//x1 <=   {4'b0000,{x[N-2:0]},{(45-N){1'b0}}};//	 

always @(posedge sclk)
	if(y_a[N-1]==1'b0 && Ninter_cnt==0)
		x1_add_en 		<=    	1'b1;
	else if(y_a[N-1]==1'b1 && Ninter_cnt==0)
		x1_add_en 		<=    	1'b0;
	else if(y_reg_dsp_y1[47]==1'b1&&cal_control==1'b0)
		x1_add_en 		<=   	1'b0;
	else if(y_reg_dsp_y1[47]==1'b0&&cal_control==1'b0)
		x1_add_en 		<=   	1'b1;		
	else 
		x1_add_en 		<=    	x1_add_en;

always @(posedge sclk)
	if(Ninter_cnt==Ninter||break_out)
		x1_mux 			<= 		x1[47:23];
	else
		x1_mux 			<= 		25'd0;


always @(posedge sclk)
	x_reg_dsp_x1_d 		<=   	x_reg_dsp_x1;


always @(*)
	if(Ninter_cnt>=Ninter||break_out)
		B_IN_x1 		= 		K_quantize;
	else if(cal_control<Ninter&&cal_control)
		B_IN_x1			= 		shift_y_reg[17:0];
	else 
		B_IN_x1 		= 		18'd0;

always @(*)
	if(Ninter_cnt==0 && cal_control)
		C_x1 			= 		{{2{x_abs[N-1]}},{x_abs[N-2:0]},{(47-N){1'b0}}};
	else if(cal_control == 1'b1)
		C_x1 			= 		x_reg_dsp_x1_d;
	else 
		C_x1 			= 		48'd0;

always @(*)
	if(Ninter_cnt>=Ninter)
		OPMODE_x1 		= 		7'b000_01_01;//B*D
	else if(x1_add_en == 1'b1)//C+(A:B)
		OPMODE_x1 		= 		7'b000_11_11;
	else 
		OPMODE_x1 		= 		7'b011_00_11;//C-(A:B)			

always @(posedge sclk)
	if(Ninter_cnt>=Ninter-2)
		INMODE_x1 		<=   	5'b00110;
	else
		INMODE_x1 		<=   	5'b00000;	

always @(*)
	if(Ninter_cnt>=Ninter)
		ALUMODE_x1 		= 		4'b0000;
	else if(x1_add_en == 1'b1)
		ALUMODE_x1 		= 		4'b0000;				
	else
		ALUMODE_x1  	= 		4'b0011;			 

always @(posedge sclk)
	if(!rst_n)
		Amp 			<=   	0;	
	else if(x_reg_dsp_x1[42]==1'b0&& |x_reg_dsp_x1[41:39] && Ninter_cnt==(Ninter+1)&&cal_control==1'b0) 
		Amp 			<=   	32'b0111_1111_1111_1111_1111_1111_1111_1111;
	else if(x_reg_dsp_x1[42]&& &x_reg_dsp_x1[41:39]==1'b0 && Ninter_cnt==(Ninter+1)&&cal_control==1'b0) 
		Amp 			<=   	32'b1000_0000_0000_0000_0000_0000_0000_0000;
	else if(Ninter_cnt==(Ninter+1)&&cal_control==1'b0) 
		Amp 			<=   	{x_reg_dsp_x1[42],x_reg_dsp_x1[38],x_reg_dsp_x1[37:40-N]};			
	else
		Amp 			<=   	Amp;

wire 		rst 	= 	~rst_n;

wire 				rst_x		;
assign 		rst_x	=	~rst_n;
	
DSP48E1 #(
// Feature Control Attributes: Data Path Selection
.A_INPUT("DIRECT"), // Selects A input source, "DIRECT" (A port) or "CASCADE" (ACIN port)
.B_INPUT("DIRECT"), // Selects B input source, "DIRECT" (B port) or "CASCADE" (BCIN port)
.USE_DPORT("TRUE"), // Select D port usage (TRUE or FALSE)
.USE_MULT("DYNAMIC"), // Select multiplier usage ("MULTIPLY", "DYNAMIC", or "NONE")
.USE_SIMD("ONE48"), // SIMD selection ("ONE48", "TWO24", "FOUR12")
// Pattern Detector Attributes: Pattern Detection Configuration
.AUTORESET_PATDET("NO_RESET"), // "NO_RESET", "RESET_MATCH", "RESET_NOT_MATCH"
.MASK(48'h3fffffffffff), // 48-bit mask value for pattern detect (1=ignore)
.PATTERN(48'h000000000000), // 48-bit pattern match for pattern detect
.SEL_MASK("MASK"), // "C", "MASK", "ROUNDING_MODE1", "ROUNDING_MODE2"
.SEL_PATTERN("PATTERN"), // Select pattern value ("PATTERN" or "C")
.USE_PATTERN_DETECT("NO_PATDET"), // Enable pattern detect ("PATDET" or "NO_PATDET")

// Register Control Attributes: Pipeline Register Configuration
.ACASCREG(0), // Number of pipeline stages between A/ACIN and ACOUT (0, 1 or 2)
.ADREG(0), // Number of pipeline stages for pre-adder (0 or 1)
.ALUMODEREG(0), // Number of pipeline stages for ALUMODE (0 or 1)
.AREG(0), // Number of pipeline stages for A (0, 1 or 2)
.BCASCREG(0), // Number of pipeline stages between B/BCIN and BCOUT (0, 1 or 2)
.BREG(0), // Number of pipeline stages for B (0, 1 or 2)
.CARRYINREG(0), // Number of pipeline stages for CARRYIN (0 or 1)
.CARRYINSELREG(0), // Number of pipeline stages for CARRYINSEL (0 or 1)
.CREG(0), // Number of pipeline stages for C (0 or 1)
.DREG(0), // Number of pipeline stages for D (0 or 1)
.INMODEREG(1), // Number of pipeline stages for INMODE (0 or 1)
.MREG(0), // Number of multiplier pipeline stages (0 or 1)
.OPMODEREG(0), // Number of pipeline stages for OPMODE (0 or 1)
.PREG(1) // Number of pipeline stages for P (0 or 1)

)

DSP48E1_cal_x1 (
// Cascade: 30-bit (each) output: Cascade Ports
.ACOUT(), // 30-bit output: A port cascade output
.BCOUT(), // 18-bit output: B port cascade output
.CARRYCASCOUT(), // 1-bit output: Cascade carry output
.MULTSIGNOUT(), // 1-bit output: Multiplier sign cascade output
.PCOUT(), // 48-bit output: Cascade output
// Control: 1-bit (each) output: Control Inputs/Status Bits
.OVERFLOW(), // 1-bit output: Overflow in add/acc output
.PATTERNBDETECT(), // 1-bit output: Pattern bar detect output
.PATTERNDETECT(), // 1-bit output: Pattern detect output
.UNDERFLOW(), // 1-bit output: Underflow in add/acc output

// Data: 4-bit (each) output: Data Ports
.CARRYOUT(), // 4-bit output: Carry output
.P(P_x1), // 48-bit output: Primary data output
// Cascade: 30-bit (each) input: Cascade Ports
.ACIN(30'd0), // 30-bit input: A cascade data input
.BCIN(18'd0), // 18-bit input: B cascade input
.CARRYCASCIN(1'b0), // 1-bit input: Cascade carry input
.MULTSIGNIN(1'b0), // 1-bit input: Multiplier sign input
.PCIN(48'd0), // 48-bit input: P cascade input
// Control: 4-bit (each) input: Control Inputs/Status Bits
.ALUMODE(ALUMODE_x1), // 4-bit input: ALU control input
.CARRYINSEL(3'b000), // 3-bit input: Carry select input
.CLK(sclk), // 1-bit input: Clock input
.INMODE(INMODE_x1), // 5-bit input: INMODE control input
.OPMODE(OPMODE_x1), // 7-bit input: Operation mode input

// Data: 30-bit (each) input: Data Ports
.A(A_IN_x1), // 30-bit input: A data input //A_IN
.B(B_IN_x1), // 18-bit input: B data input //B_IN
.C(C_x1), // 48-bit input: C data input
.CARRYIN(1'b0), // 1-bit input: Carry input signal
.D(D), // 25-bit input: D data input
 
// Reset/Clock Enable: 1-bit (each) input: Reset/Clock Enable Inputs
.CEA1(1'b1), // 1-bit input: Clock enable input for 1st stage AREG
.CEA2(1'b0), // 1-bit input: Clock enable input for 2nd stage AREG
.CEAD(1'b0), // 1-bit input: Clock enable input for ADREG
.CEALUMODE(1'b1), // 1-bit input: Clock enable input for ALUMODE
.CEB1(1'b1), // 1-bit input: Clock enable input for 1st stage BREG
.CEB2(1'b0), // 1-bit input: Clock enable input for 2nd stage BREG
.CEC(1'b1), // 1-bit input: Clock enable input for CREG

.CECARRYIN(1'b0), // 1-bit input: Clock enable input for CARRYINREG
.CECTRL(1'b1), // 1-bit input: Clock enable input for OPMODEREG and CARRYINSELREG
.CED(1'b1), // 1-bit input: Clock enable input for DREG
.CEINMODE(1'b1), // 1-bit input: Clock enable input for INMODEREG
.CEM(1'b1), // 1-bit input: Clock enable input for MREG
.CEP(1'b1), // 1-bit input: Clock enable input for PREG

.RSTA(rst_x), // 1-bit input: Reset input for AREG
.RSTALLCARRYIN(rst_x), // 1-bit input: Reset input for CARRYINREG
.RSTALUMODE(rst_x), // 1-bit input: Reset input for ALUMODEREG
.RSTB(rst_x), // 1-bit input: Reset input for BREG
.RSTC(rst_x), // 1-bit input: Reset input for CREG
.RSTCTRL(rst_x), // 1-bit input: Reset input for OPMODEREG and CARRYINSELREG
.RSTD(rst_x), // 1-bit input: Reset input for DREG and ADREG
.RSTINMODE(rst_x), // 1-bit input: Reset input for INMODEREG
.RSTM(rst_x), // 1-bit input: Reset input for MREG
.RSTP(rst_x) // 1-bit input: Reset input for PREG
);
/*===================================================================
			  y1 = y_reg + d*(shift(n+1)*x_reg);
====================================================================*/
wire 		[47:0]		cal_x_abs		;
reg 		[47:0]		y_reg_dsp_y1_d	;

assign 	cal_x_abs 		= 	{{2{x_abs[N-1]}},{x_abs[N-2:0]},{(47-N){1'b0}}};
assign 	y_reg_dsp_y1 	= 	P_y1;
 
always @(posedge sclk)
	if(y_a[N-1]==1'b0 && Ninter_cnt_copy1==0)
		y1_add_en 			<=    	1'b0;
	else if(y_a[N-1]==1'b1 && Ninter_cnt_copy1==0)
		y1_add_en 			<=    	1'b1;
	else if(y_reg_dsp_y1[47]==1'b1&&cal_control==1'b0)
		y1_add_en 			<=    	1'b1;
	else if(y_reg_dsp_y1[47]==1'b0&&cal_control==1'b0)
		y1_add_en 			<=    	1'b0;
	else 
		y1_add_en 			<=    	1'b0;

always @(posedge sclk)
	y_reg_dsp_y1_d 			<=   	y_reg_dsp_y1;

always @(*)
	if(Ninter_cnt_copy1==0&&cal_control)
		A_IN_y1 			= 		cal_x_abs[47:18];
	else if(cal_control)
		A_IN_y1 			= 		shift_x_reg[47:18];
	else 
		A_IN_y1	 			= 		30'd0; 

always @(*)
	if(Ninter_cnt_copy1==0&&cal_control)
		B_IN_y1 			= 		cal_x_abs[17:0];
	else if(cal_control == 1'b1)
		B_IN_y1 			= 		shift_x_reg[17:0];
	else 
		B_IN_y1 			= 		30'd0;

always @(*)
	if(Ninter_cnt_copy1==0&&cal_control)
		C_y1 				= 		y1;
	else if(cal_control)
		C_y1 				= 		y_reg_dsp_y1_d;
	else 
		C_y1 				= 		30'd0;

/*=======================================================================
			DSP y1 control :add sub and pattern 
=======================================================================*/
assign INMODE_y1 			= 		5'b00000;

always @(*)
	if(y1_add_en == 1'b1)//C+(A:B)
		OPMODE_y1 			= 		7'b000_11_11;
	else 
		OPMODE_y1 			= 		7'b011_00_11;//C-(A:B)		

always @(*)
	if(y1_add_en == 1'b1)
		ALUMODE_y1 			= 		4'b0000;				
	else
		ALUMODE_y1  		= 		4'b0011;			

/*=======================================================================
			DSP y1 control
=======================================================================*/
DSP48E1 #(
// Feature Control Attributes: Data Path Selection
.A_INPUT("DIRECT"), // Selects A input source, "DIRECT" (A port) or "CASCADE" (ACIN port)
.B_INPUT("DIRECT"), // Selects B input source, "DIRECT" (B port) or "CASCADE" (BCIN port)
.USE_DPORT("FALSE"), // Select D port usage (TRUE or FALSE)
.USE_MULT("NONE"), // Select multiplier usage ("MULTIPLY", "DYNAMIC", or "NONE")
.USE_SIMD("ONE48"), // SIMD selection ("ONE48", "TWO24", "FOUR12")
// Pattern Detector Attributes: Pattern Detection Configuration
.AUTORESET_PATDET("NO_RESET"), // "NO_RESET", "RESET_MATCH", "RESET_NOT_MATCH"
.MASK(48'd0), // 48-bit mask value for pattern detect (1=ignore)
.PATTERN(48'h000000000000), // 48-bit pattern match for pattern detect
.SEL_MASK("MASK"), // "C", "MASK", "ROUNDING_MODE1", "ROUNDING_MODE2"
.SEL_PATTERN("PATTERN"), // Select pattern value ("PATTERN" or "C")
.USE_PATTERN_DETECT("PATDET"), // Enable pattern detect ("PATDET" or "NO_PATDET")

// Register Control Attributes: Pipeline Register Configuration
.ACASCREG(0), // Number of pipeline stages between A/ACIN and ACOUT (0, 1 or 2)
.ADREG(0), // Number of pipeline stages for pre-adder (0 or 1)
.ALUMODEREG(0), // Number of pipeline stages for ALUMODE (0 or 1)
.AREG(0), // Number of pipeline stages for A (0, 1 or 2)
.BCASCREG(0), // Number of pipeline stages between B/BCIN and BCOUT (0, 1 or 2)
.BREG(0), // Number of pipeline stages for B (0, 1 or 2)
.CARRYINREG(0), // Number of pipeline stages for CARRYIN (0 or 1)
.CARRYINSELREG(0), // Number of pipeline stages for CARRYINSEL (0 or 1)
.CREG(0), // Number of pipeline stages for C (0 or 1)
.DREG(0), // Number of pipeline stages for D (0 or 1)
.INMODEREG(0), // Number of pipeline stages for INMODE (0 or 1)
.MREG(0), // Number of multiplier pipeline stages (0 or 1)
.OPMODEREG(0), // Number of pipeline stages for OPMODE (0 or 1)
.PREG(1) // Number of pipeline stages for P (0 or 1)

)

DSP48E1_cal_y1 (
// Cascade: 30-bit (each) output: Cascade Ports
.ACOUT(), // 30-bit output: A port cascade output
.BCOUT(), // 18-bit output: B port cascade output
.CARRYCASCOUT(), // 1-bit output: Cascade carry output
.MULTSIGNOUT(), // 1-bit output: Multiplier sign cascade output
.PCOUT(), // 48-bit output: Cascade output
// Control: 1-bit (each) output: Control Inputs/Status Bits
.OVERFLOW(), // 1-bit output: Overflow in add/acc output
.PATTERNBDETECT(), // 1-bit output: Pattern bar detect output
.PATTERNDETECT(break_cal), // 1-bit output: Pattern detect output
.UNDERFLOW(), // 1-bit output: Underflow in add/acc output

// Data: 4-bit (each) output: Data Ports
.CARRYOUT(), // 4-bit output: Carry output
.P(P_y1), // 48-bit output: Primary data output
// Cascade: 30-bit (each) input: Cascade Ports
.ACIN(30'd0), // 30-bit input: A cascade data input
.BCIN(18'd0), // 18-bit input: B cascade input
.CARRYCASCIN(1'b0), // 1-bit input: Cascade carry input
.MULTSIGNIN(1'b0), // 1-bit input: Multiplier sign input
.PCIN(48'd0), // 48-bit input: P cascade input
// Control: 4-bit (each) input: Control Inputs/Status Bits
.ALUMODE(ALUMODE_y1), // 4-bit input: ALU control input
.CARRYINSEL(3'b000), // 3-bit input: Carry select input
.CLK(sclk), // 1-bit input: Clock input
.INMODE(INMODE_y1), // 5-bit input: INMODE control input
.OPMODE(OPMODE_y1), // 7-bit input: Operation mode input

// Data: 30-bit (each) input: Data Ports
.A(A_IN_y1), // 30-bit input: A data input //A_IN
.B(B_IN_y1), // 18-bit input: B data input //B_IN
.C(C_y1), // 48-bit input: C data input
.CARRYIN(1'b0), // 1-bit input: Carry input signal
.D(), // 25-bit input: D data input
 
// Reset/Clock Enable: 1-bit (each) input: Reset/Clock Enable Inputs
.CEA1(1'b1), // 1-bit input: Clock enable input for 1st stage AREG
.CEA2(1'b0), // 1-bit input: Clock enable input for 2nd stage AREG
.CEAD(1'b0), // 1-bit input: Clock enable input for ADREG
.CEALUMODE(1'b1), // 1-bit input: Clock enable input for ALUMODE
.CEB1(1'b1), // 1-bit input: Clock enable input for 1st stage BREG
.CEB2(1'b0), // 1-bit input: Clock enable input for 2nd stage BREG
.CEC(1'b1), // 1-bit input: Clock enable input for CREG

.CECARRYIN(1'b0), // 1-bit input: Clock enable input for CARRYINREG
.CECTRL(1'b1), // 1-bit input: Clock enable input for OPMODEREG and CARRYINSELREG
.CED(1'b0), // 1-bit input: Clock enable input for DREG
.CEINMODE(1'b1), // 1-bit input: Clock enable input for INMODEREG
.CEM(1'b1), // 1-bit input: Clock enable input for MREG
.CEP(1'b1), // 1-bit input: Clock enable input for PREG

.RSTA(rst), // 1-bit input: Reset input for AREG
.RSTALLCARRYIN(rst), // 1-bit input: Reset input for CARRYINREG
.RSTALUMODE(rst), // 1-bit input: Reset input for ALUMODEREG
.RSTB(rst), // 1-bit input: Reset input for BREG
.RSTC(rst), // 1-bit input: Reset input for CREG
.RSTCTRL(rst), // 1-bit input: Reset input for OPMODEREG and CARRYINSELREG
.RSTD(rst), // 1-bit input: Reset input for DREG and ADREG
.RSTINMODE(rst), // 1-bit input: Reset input for INMODEREG
.RSTM(rst), // 1-bit input: Reset input for MREG
.RSTP(rst) // 1-bit input: Reset input for PREG
);
/*====================================================================*/
//quantize nearest;  z = quantize(q_ang,z - d*atan_z(n+1));
// Ang = -z -/+ pi_quan  ;
/*====================================================================*/


/*=======================================================================
			DSP z control and input
=======================================================================*/
wire 			[29:0]		A_IN_z		;
reg 			[17:0]		B_IN_z		;
reg 			[47:0]		C_IN_z		;
wire 			[47:0]		P_Z			;
reg 			[17:0]		z_dsp_lut_d ;
reg 			[ 6:0]		OPMODE_z	;
reg 						z_add		;
wire 			[ 4:0]		INMODE_z	;
reg 			[ 3:0]		ALUMODE_z	;

assign A_IN_z 		=  		30'd0;//x1[47:18];
assign z_dsp_lut 	= 		P_Z[17:0];
assign INMODE_z 	= 		5'b00000;

always @(*)
	if(cal_control==1'b1)
		B_IN_z 		= 		atanz[17:0];
	else 
		B_IN_z 		= 		18'd0;

always @(*)
	if(Ninter_cnt_copy1==0)
		C_IN_z 		= 		48'd0;
	else if(cal_control==1'b1)
		C_IN_z 		= 		{30'd0,z_dsp_lut_d};
	else 
		C_IN_z		= 		48'd0;


always @(posedge sclk)
	z_dsp_lut_d 	<=   	P_Z[17:0];

always @(posedge sclk)
	if(y_a[N-1]==1'b0 && Ninter_cnt_copy1==0)
		z_add 		<=    	1'b1;
	else if(y_a[N-1]==1'b1 && Ninter_cnt_copy1==0)
		z_add 		<=    	1'b0;
	else if(y_reg_dsp_y1[47]==1'b1&&cal_control==1'b0)
		z_add 		<=   	1'b0;
	else if(y_reg_dsp_y1[47]==1'b0&&cal_control==1'b0)
		z_add 		<=   	1'b1;		
	else 
		z_add 		<=    	z_add;

always @(*)
	if(z_add == 1'b1)//C+(A:B)
		OPMODE_z 	=		7'b000_11_11;
	else 
		OPMODE_z 	= 		7'b011_00_11;//C-(A:B)			

always @(*)
	if(z_add == 1'b1)
		ALUMODE_z	= 		4'b0000;				
	else
		ALUMODE_z  	= 		4'b0011;			 

wire 			rst_z 					;
assign 			rst_z 		= 		rst_x;
DSP48E1 #(
// Feature Control Attributes: Data Path Selection
.A_INPUT("DIRECT"), // Selects A input source, "DIRECT" (A port) or "CASCADE" (ACIN port)
.B_INPUT("DIRECT"), // Selects B input source, "DIRECT" (B port) or "CASCADE" (BCIN port)
.USE_DPORT("FALSE"), // Select D port usage (TRUE or FALSE)
.USE_MULT("NONE"), // Select multiplier usage ("MULTIPLY", "DYNAMIC", or "NONE")
.USE_SIMD("ONE48"), // SIMD selection ("ONE48", "TWO24", "FOUR12")
// Pattern Detector Attributes: Pattern Detection Configuration
.AUTORESET_PATDET("NO_RESET"), // "NO_RESET", "RESET_MATCH", "RESET_NOT_MATCH"
.MASK(48'd0), // 48-bit mask value for pattern detect (1=ignore)
.PATTERN(48'h000000000000), // 48-bit pattern match for pattern detect
.SEL_MASK("MASK"), // "C", "MASK", "ROUNDING_MODE1", "ROUNDING_MODE2"
.SEL_PATTERN("PATTERN"), // Select pattern value ("PATTERN" or "C")
.USE_PATTERN_DETECT("NO_PATDET"), // Enable pattern detect ("PATDET" or "NO_PATDET")

// Register Control Attributes: Pipeline Register Configuration
.ACASCREG(0), // Number of pipeline stages between A/ACIN and ACOUT (0, 1 or 2)
.ADREG(0), // Number of pipeline stages for pre-adder (0 or 1)
.ALUMODEREG(0), // Number of pipeline stages for ALUMODE (0 or 1)
.AREG(0), // Number of pipeline stages for A (0, 1 or 2)
.BCASCREG(0), // Number of pipeline stages between B/BCIN and BCOUT (0, 1 or 2)
.BREG(0), // Number of pipeline stages for B (0, 1 or 2)
.CARRYINREG(0), // Number of pipeline stages for CARRYIN (0 or 1)
.CARRYINSELREG(0), // Number of pipeline stages for CARRYINSEL (0 or 1)
.CREG(0), // Number of pipeline stages for C (0 or 1)
.DREG(0), // Number of pipeline stages for D (0 or 1)
.INMODEREG(0), // Number of pipeline stages for INMODE (0 or 1)
.MREG(0), // Number of multiplier pipeline stages (0 or 1)
.OPMODEREG(0), // Number of pipeline stages for OPMODE (0 or 1)
.PREG(1) // Number of pipeline stages for P (0 or 1)

)

DSP48E1_cal_z (
// Cascade: 30-bit (each) output: Cascade Ports
.ACOUT(), // 30-bit output: A port cascade output
.BCOUT(), // 18-bit output: B port cascade output
.CARRYCASCOUT(), // 1-bit output: Cascade carry output
.MULTSIGNOUT(), // 1-bit output: Multiplier sign cascade output
.PCOUT(), // 48-bit output: Cascade output
// Control: 1-bit (each) output: Control Inputs/Status Bits
.OVERFLOW(), // 1-bit output: Overflow in add/acc output
.PATTERNBDETECT(), // 1-bit output: Pattern bar detect output
.PATTERNDETECT(), // 1-bit output: Pattern detect output
.UNDERFLOW(), // 1-bit output: Underflow in add/acc output

// Data: 4-bit (each) output: Data Ports
.CARRYOUT(), // 4-bit output: Carry output
.P(P_Z), // 48-bit output: Primary data output
// Cascade: 30-bit (each) input: Cascade Ports
.ACIN(30'd0), // 30-bit input: A cascade data input
.BCIN(18'd0), // 18-bit input: B cascade input
.CARRYCASCIN(1'b0), // 1-bit input: Cascade carry input
.MULTSIGNIN(1'b0), // 1-bit input: Multiplier sign input
.PCIN(48'd0), // 48-bit input: P cascade input
// Control: 4-bit (each) input: Control Inputs/Status Bits
.ALUMODE(ALUMODE_z), // 4-bit input: ALU control input
.CARRYINSEL(3'b000), // 3-bit input: Carry select input
.CLK(sclk), // 1-bit input: Clock input
.INMODE(INMODE_z), // 5-bit input: INMODE control input
.OPMODE(OPMODE_z), // 7-bit input: Operation mode input

// Data: 30-bit (each) input: Data Ports
.A(A_IN_z), // 30-bit input: A data input //A_IN
.B(B_IN_z), // 18-bit input: B data input //B_IN
.C(C_IN_z), // 48-bit input: C data input
.CARRYIN(1'b0), // 1-bit input: Carry input signal
.D(25'd0), // 25-bit input: D data input
 
// Reset/Clock Enable: 1-bit (each) input: Reset/Clock Enable Inputs
.CEA1(1'b1), // 1-bit input: Clock enable input for 1st stage AREG
.CEA2(1'b0), // 1-bit input: Clock enable input for 2nd stage AREG
.CEAD(1'b0), // 1-bit input: Clock enable input for ADREG
.CEALUMODE(1'b1), // 1-bit input: Clock enable input for ALUMODE
.CEB1(1'b1), // 1-bit input: Clock enable input for 1st stage BREG
.CEB2(1'b0), // 1-bit input: Clock enable input for 2nd stage BREG
.CEC(1'b1), // 1-bit input: Clock enable input for CREG

.CECARRYIN(1'b0), // 1-bit input: Clock enable input for CARRYINREG
.CECTRL(1'b1), // 1-bit input: Clock enable input for OPMODEREG and CARRYINSELREG
.CED(1'b0), // 1-bit input: Clock enable input for DREG
.CEINMODE(1'b1), // 1-bit input: Clock enable input for INMODEREG
.CEM(1'b1), // 1-bit input: Clock enable input for MREG
.CEP(1'b1), // 1-bit input: Clock enable input for PREG

.RSTA(rst_z), // 1-bit input: Reset input for AREG
.RSTALLCARRYIN(rst_z), // 1-bit input: Reset input for CARRYINREG
.RSTALUMODE(rst_z), // 1-bit input: Reset input for ALUMODEREG
.RSTB(rst_z), // 1-bit input: Reset input for BREG
.RSTC(rst_z), // 1-bit input: Reset input for CREG
.RSTCTRL(rst_z), // 1-bit input: Reset input for OPMODEREG and CARRYINSELREG
.RSTD(rst_z), // 1-bit input: Reset input for DREG and ADREG
.RSTINMODE(rst_z), // 1-bit input: Reset input for INMODEREG
.RSTM(rst_z), // 1-bit input: Reset input for MREG
.RSTP(rst_z) // 1-bit input: Reset input for PREG
);

always @(posedge sclk)
	if(!rst_n)	
		z_dsp_lut_delay1 			<=   		0;
	else if(break_happen==1'b1&&first_break_happen==1'b0)
		z_dsp_lut_delay1 			<=   		z_dsp_lut;
	else  if(Ninter_cnt_copy1[3:0]==Ninter&&cal_control&&break_out==1'b0)
		z_dsp_lut_delay1 			<=   		z_dsp_lut_d;

always @(posedge sclk)
	if(!rst_n)
		z 							<=   		0;
	else if(break_happen==1'b1&&first_break_happen==1'b0)
		z 							<=   		(~z_dsp_lut[17:0])+1'b1;	
	else if(Ninter_cnt_copy1[3:0]==Ninter&&cal_control&&break_out==1'b0)
		z 							<=   		(~z_dsp_lut_d[17:0])+1'b1;
	else
		z 							<=   		z;

always @(posedge sclk)
	if(!rst_n)
		Ang 						<=    		0;
	else if(y_a==0&&x_a[N-1]==1'b0)
		Ang 						<=    		0;
	else if(y_a==0&&x_a[N-1]==1'b1)
		Ang 						<=    		18'b011001001000100000;
	else if(Ninter_cnt_copy1==Ninter+1 && x_sign && y_sign)
		Ang 						<=    		z - 18'b011001001000100000;		
	else if(Ninter_cnt_copy1==Ninter+1 && x_sign && y_sign==0)
		Ang 						<=    		z + 18'b011001001000100000;	
	else if(Ninter_cnt_copy1==Ninter+1)
		Ang 						<=    		z_dsp_lut_delay1;	
	else 
		Ang 						<=    		Ang;

always @(posedge sclk)
	if(Ninter_cnt_copy1==4'he&&cal_control==1'b0)
		Ang_en 						<=   		1'b1;
	else 
		Ang_en 						<=   		1'b0;

endmodule

上面的代碼爲了限制使用DSP的數目,所以使用了DSP原語,這也是導致代碼長度過長的原因。如果我們不考慮使用DSP原語而是讓編譯器自動幫我們進行編譯綜合,那麼我代碼可以精簡到300行。至於DSP原語的使用這裏不再贅述,個人感覺喫力不討好,當然不排除自己人太菜沒達到那種逼格。我們這裏重點關注一下MATLAB與FPGA之間量化的對應。
ang在MATLAB中的量化:
在這裏插入圖片描述
對應在FPGA中的處理:
在這裏插入圖片描述
在這裏插入圖片描述
這裏需要注意FPGA默認的量化截取方式與下面對應:
在這裏插入圖片描述

amp在MATLAB中的量化:
在這裏插入圖片描述
ang在MATLAB中的量化:
在這裏插入圖片描述
其實就是直接截取了低位,關於量化的操作在FPGA與MATLAB數據相互對應的方面是特別重要的。

這裏簡要總結一下就是FPGA自己計算的就是:
在這裏插入圖片描述
如果我們進行了截位就是:
在這裏插入圖片描述

MATLAB測試代碼

這裏我們給出MATLAB測試代碼用來對比MATLAB與Modelsim兩者仿真之間代碼的一致性,代碼如下:

clc;
clear all;

fid1 = fopen('Ang_matlab.txt','r');
x_fix = fscanf(fid1,'%d');  

fid2 = fopen('Amp_matlab.txt','r');
y_fix = fscanf(fid2,'%d');  

fid3 = fopen('Ang_fpga.txt','r');
x_fpga = fscanf(fid3,'%d');  

fid4 = fopen('Amp_fpga.txt','r');
y_fpga = fscanf(fid4,'%d');  

sum1 = sum(abs(x_fix - x_fpga));
sum2 = sum(abs(y_fix - y_fpga));

結果如下:
在這裏插入圖片描述
從上面實驗驗證了我們實驗的正確性。

總結

創作不易,認爲文章有幫助的同學們可以關注、點贊、轉發支持。爲行業貢獻及其微小的一部分。或者對文章有什麼看法或者需要更近一步交流的同學,可以加入下面的羣:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章