FIR濾波器的四種實現方式及性能比較【VHDL+MATLAB】

EE323 DSD Project Report

Introduction:

In this project, we review the knowledge we learnt in the digital signal
process.

Then we use MATLAB to generate the filter coefficients and convert them into
binary. We use some added sin wave with different frequency to test our VHDL
project. We first use direct form and transposed form. Then we use algorithm
to optimize these forms. And compare the performance of different methods.

Task one:

  1. Filter coefficients design:

Passband edge frequency 𝐹𝑝 and stopband edge frequency 𝐹𝑠 are specified in
Hz, along with sampling frequencies.

For S2:

2 ∗ 𝜋 ∗ 𝐹𝑝

𝜔𝑝 = 0.042𝜋 = 𝐹 ⇒ 𝐹

𝐹𝑝

= 0.021

𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

2 ∗ 𝜋 ∗ 𝐹𝑠

𝜔𝑠 = 0.14𝜋 =

𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

𝐹𝑠

𝐹𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

= 0.07

𝛼𝑝 = −20 ∗ 𝑙𝑜𝑔10(1 − 𝛿𝑝) = 0.10486 𝑑𝐵

𝛼𝑠 = −20 ∗ 𝑙𝑜𝑔10(𝛿𝑠) = 60 𝑑𝐵

Then we can input these parameters into MATLAB and get the filter
coefficients. The magnitude response in 𝒅𝑩 scale was shown in this figure.
We can find that the filter can stratifies the conditions. In the passband,
we can find that the magnitude response is zero in 𝒅𝑩 scale which indicates
the signal with low frequency can pass easily. In the stopband, we can find
that the magnitude response is smaller than 60

in 𝒅𝑩 scale which indicates the signal with high frequency cannot pass
easily. Then we can take a look at the transition band, we can find the
slope of the transition band is steep.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-cArVRZCt-1591456787132)(media/0b5cbfa98d11b830008101184ccfcc6c.jpg)]

  1. Waveform generation:

Then we can use MATLAB to generate test data to test the coefficients of
filter and the VHDL project. We need to generate two different frequency sin
wave and add them together. Then we need to quantize the output waveform and
convert it into binary.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-y0tovVML-1591456787137)(media/d37d95a6341918114c8f886405c46e0c.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-DtqR9a6c-1591456787139)(media/60395502b1940d1f39734e68c29cd480.jpg)]

By these two figures, we can find that the coefficients of the filter are
correct, because the output waveform after the filter, the high frequency
components was filtered. Although the after filter waveform has some points
which are not perfect, the purpose of filter the high frequency components
has achieved.

  1. Direct form:

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-IPcwtS0W-1591456787142)(media/7a1b625122327066abe662b818816a09.jpg)]

We had learnt some basic knowledge about filters in DSP. The direct form is
the most common filter form. It will cost N times multiplication and N-1
times addition. We also encountered some problems in the process of writing
the code. There are two main aspects, the first is the operation of
filtering wash. As the filtering coefficient contains decimal and negative
parts, it cannot be converted directly, and the coefficient must be
processed before the operation. The method we use here is the fixed-point
method of decimals. Since the decimal point is fixed, there is no need to
store it, so we can convert the decimal to an integer to perform operations.
We express fixed-point

decimals in binary terms. The highest bit is the sign bit, so it’s valid to
be 15 bits. Before we do that, we’re going to say that all data types are
signed. Since converting a fixed point decimal to a decimal results in
rounding and discarding the remaining decimal parts, this algorithm will
still have errors, but will be very small. The second problem is the
overflow problem in the calculation process. Due to the phenomenon of carry
in addition calculation, however, it is difficult to increase the number of
bits by loop during calculation because the same bits can only be added. In
this case, we use the method of resize to readjust each calculated binary
digit to include the overflow digit.

1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity dir_fir1 is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end dir_fir1;
13.
14. architecture fir_bhv of dir_fir1 is
15. type array_1 is array(0 to 51) of signed(15 downto 0);
16. type array_2 is array(0 to 51) of signed(27 downto 0);
17. type array_3 is array(1 to 51) of signed(27 downto 0);
18. signal tap : array_1;
19. signal tap_sum : array_3;
20. signal tap_mul : array_2;
21. constant coef :array_1 := (to_signed(3,16),
22. to_signed(-33,16),
23. to_signed(-49,16),
24. to_signed(-75,16),
25. to_signed(-115,16),
26. to_signed(-154,16),
27. to_signed(-197,16),
28. to_signed(-236,16),
29. to_signed(-269,16),
30. to_signed(-282,16),
31. to_signed(-275,16),
32. to_signed(-236,16),
33. to_signed(-161,16),
34. to_signed(-46,16),
35. to_signed(115,16),
36. to_signed(318,16),
37. to_signed(560,16),
38. to_signed(836,16),
39. to_signed(1134,16),
40. to_signed(1442,16),
41. to_signed(1747,16),
42. to_signed(2032,16),
43. to_signed(2284,16),
44. to_signed(2484,16),
45. to_signed(2628,16),
46. to_signed(2700,16),
47. to_signed(2700,16),
48. to_signed(2628,16),
49. to_signed(2484,16),
50. to_signed(2284,16),
51. to_signed(2032,16),
52. to_signed(1747,16),
53. to_signed(1442,16),
54. to_signed(1134,16),
55. to_signed(836,16),
56. to_signed(560,16),
57. to_signed(318,16),
58. to_signed(115,16),
59. to_signed(-46,16),
60. to_signed(-161,16),
61. to_signed(-236,16),
62. to_signed(-275,16),
63. to_signed(-282,16),
64. to_signed(-269,16),
65. to_signed(-236,16),
66. to_signed(-197,16),
67. to_signed(-154,16),
68. to_signed(-115,16),
69. to_signed(-75,16),
70. to_signed(-49,16),
71. to_signed(-33,16),
72. to_signed(3,16));
73. begin
74. proc:process(Clk,Reset)
75. begin
76. if reset=‘1’ then
77. for i in 0 to 51 loop
78. tap(i) <= (others => ‘0’);
79. end loop;
80. for i in 1 to 51 loop
81. tap_sum(i) <= (others => ‘0’);
82. end loop;
83. elsif CLK’EVENT AND (CLK=‘1’) then
84. tap(0) <= signed(data_In);
85. tap(1 to 51) <= tap(0 to 50);
86. for i in 0 to 51 loop
87. tap_mul(i) <= tap(i)*coef(i);
88. end loop;
89. tap_sum(0) <= tap_mul(0);
90. for i in 1 to 51 loop
91. tap_sum(i) <= tap_sum(i-1) + tap_mul(i);
92. end loop;
93. end if;
94. end process;
95. data_Out <= std_logic_vector(tap_sum(51));
96. end;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-37g1l8DJ-1591456787144)(media/735911be7c9532c23cd0cf27ff165789.png)]

This figure is the behavior stimulation of direct form.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-9AM2ojox-1591456787147)(media/dfc9e93945b7c56f6106e3f8c511b3e2.png)]

This figure is the implementation timing stimulation of direct form. Then by
observing these two figures we can find that the results of the VHDL project
is correct. Although the output curves still have some glitches and
distortion. The filter has removed the high frequency components
successfully.

The reason why the output curves still have some glitches and distortion, we

guess that the reasons are:

  1. The sampling rate is not high enough which would cause the output
    waveform has some distorted points. These distorted points would lead the
    input waveform in VHDL project would be distorted.

  2. The coefficients of the filter need to be quantization which would lose
    some accuracy. These coefficients maybe cause the glitches in the output
    waveform.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-NgbYDFrJ-1591456787148)(media/afc79b69bac9447d403534b9c81807d8.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-33GVeyF8-1591456787151)(media/66166bcca242dc644f59a4ddaba7eed6.jpg)]

We can find that the total power of the direct form of S2 is 0.136W. The
static power occupies most of total power to 0.091W. And in the dynamic
power, we can find that the DSP part occupies 0.024W which is the largest in
dynamic power. The reason why DSP occupies so much is that the direct form
need many computations such as multiplier and addition.

  1. Transposed form:

The transposable structure does not store the input data, but stores the
result after multiplication and accumulation. In this way, there are only 1
multiplication and 1 addition operation on the critical path, which is much
shorter than the direct structure.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-VZIM5LPg-1591456787153)(media/7a1b625122327066abe662b818816a09.jpg)]

1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity tra_fir is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end tra_fir;
13.
14. architecture fir_bhv of tra_fir is
15.
16. type array_1 is array(1 to 51) of signed(27 downto 0);
17. type array_2 is array(0 to 51) of signed(27 downto 0);
18. type array_3 is array(0 to 51) of signed(15 downto 0);
19. signal tap : signed(15 downto 0);
20. signal tap_sum : array_1;
21. signal tap_Delay : array_1;
22. signal tap_mul : array_2;
23. constant coef :array_3 := (to_signed(3,16),
24. to_signed(-33,16),
25. to_signed(-49,16),
26. to_signed(-75,16),
27. to_signed(-115,16),
28. to_signed(-154,16),
29. to_signed(-197,16),
30. to_signed(-236,16),
31. to_signed(-269,16),
32. to_signed(-282,16),
33. to_signed(-275,16),
34. to_signed(-236,16),
35. to_signed(-161,16),
36. to_signed(-46,16),
37. to_signed(115,16),
38. to_signed(318,16),
39. to_signed(560,16),
40. to_signed(836,16),
41. to_signed(1134,16),
42. to_signed(1442,16),
43. to_signed(1747,16),
44. to_signed(2032,16),
45. to_signed(2284,16),
46. to_signed(2484,16),
47. to_signed(2628,16),
48. to_signed(2700,16),
49. to_signed(2700,16),
50. to_signed(2628,16),
51. to_signed(2484,16),
52. to_signed(2284,16),
53. to_signed(2032,16),
54. to_signed(1747,16),
55. to_signed(1442,16),
56. to_signed(1134,16),
57. to_signed(836,16),
58. to_signed(560,16),
59. to_signed(318,16),
60. to_signed(115,16),
61. to_signed(-46,16),
62. to_signed(-161,16),
63. to_signed(-236,16),
64. to_signed(-275,16),
65. to_signed(-282,16),
66. to_signed(-269,16),
67. to_signed(-236,16),
68. to_signed(-197,16),
69. to_signed(-154,16),
70. to_signed(-115,16),
71. to_signed(-75,16),
72. to_signed(-49,16),
73. to_signed(-33,16),
74. to_signed(3,16));
75. begin
76. proc:process(Clk,Reset)
77. begin
78. if reset=‘1’ then
79. tap <= (others => ‘0’);
80. for i in 1 to 51 loop
81. tap_sum(i) <= (others => ‘0’);
82. tap_Delay(i) <= (others => ‘0’);
83. end loop;
84. elsif CLK’EVENT AND (CLK=‘1’) then
85. tap <= data_In;
86. for i in 0 to 51 loop
87. tap_mul(i) <= tap*coef(i);
88. end loop;
89. for i in 1 to 51 loop
90. tap_delay(i) <= tap_mul(i-1);
91. tap_sum(i) <= tap_Delay(i) + tap_mul(i);
92. end loop;
93. end if;
94. end process;
95. data_Out <= std_logic_vector(tap_sum(51));
96. end;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-cT1xY2sS-1591456787155)(media/98e7d948d8cca18f19d941865c93758a.png)]

This figure is the implementation timing stimulation of transposed form.
Then by observing these two figures we can find that the results of the VHDL
project is correct. Although the output curves still have some glitches and
distortion. The filter has removed the high frequency components
successfully.

The reason why the output curves still have some glitches and distortion, we
guess that the reasons had been talked in the direct form.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-v00E4EX0-1591456787156)(media/6d97be9b0cb152ce2517b15e2e63152f.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-7XSGJpm7-1591456787158)(media/9133aaf742e72408263ea9ec5f328831.png)]

We can find that the total power of the transposed form of S2 is 0.097W. The
static power occupies most of total power to 0.091W. And in the dynamic
power, we can find that the DSP part occupies smaller than 0.001W which has
decreases 0.023W compare to the direct form. The reason why the power DSP
occupies decreases so much is that the transposed form need half
multiplication compare to direct form. And we can find that the I/O power in
dynamic power is also decreases.

Task two:

  1. Filter coefficients design: For S1:

2 ∗ 𝜋 ∗ 𝐹𝑝

𝐹𝑝

𝜔𝑝 = 0.3𝜋 =

𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

𝐹𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

= 0.15

2 ∗ 𝜋 ∗ 𝐹𝑠

𝜔𝑠 = 0.5𝜋 = 𝐹 ⇒ 𝐹

𝐹𝑠

= 0.25

𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

𝛼𝑝 = −20 ∗ 𝑙𝑜𝑔10(1 − 𝛿𝑝) = 0.1374 𝑑𝐵

𝛼𝑠 = −20 ∗ 𝑙𝑜𝑔10(𝛿𝑠) = 43.6091 𝑑𝐵

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-Ne6fgDk2-1591456787159)(media/5c2afd21f51e37ecdc4c04a13bc76b1b.jpg)]

Then we can input these parameters into MATLAB and get the filter
coefficients. The magnitude response in 𝒅𝑩 scale was shown in this figure.
We can find that the filter can stratifies the conditions. In the passband,
we can find that the magnitude response is zero in 𝒅𝑩 scale which indicates
the signal with low frequency can pass easily. In the stopband, we can find
that the magnitude response is smaller than -43 in 𝒅𝑩 scale which indicates
the signal with high frequency cannot pass easily. Then we can take a look
at the transition band, we can find the slope of the transition band is
steep.

  1. Waveform generation:

Then we can use MATLAB to generate test data to test the coefficients of
filter and the VHDL project. We need to generate two different frequency sin
wave and add them together. Then we need to quantize the output waveform and
convert it into binary.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ZEuIkK2y-1591456787161)(media/02855ce13d2c74919a5eb579a3c1b3d2.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-C67KNQQm-1591456787162)(media/a14f6c7ea416d6ded9737c9bca11b8be.jpg)]

By these two figures, we can find that the coefficients of the filter are
correct, because the output waveform after the filter, the high frequency
components was filtered. Although the after filter waveform has some points
which are not perfect, the purpose of filter the high frequency components
has achieved. So that we can convert these coefficients into binary.

  1. Direct form:
1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity dir_fir1 is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end dir_fir1;
13.
14. architecture fir_bhv of dir_fir1 is
15. type array_1 is array(0 to 19) of signed(11 downto 0);
16. type array_2 is array(0 to 19) of signed(27 downto 0);
17. type array_3 is array(1 to 19) of signed(27 downto 0);
18. type array_4 is array(0 to 19) of signed(15 downto 0);
19. signal tap : array_1;
20. signal tap_sum : array_3;
21. signal tap_mul : array_2;
22. constant coef :array_4 := (to_signed(-285,16),
23. to_signed(-455,16),
24. to_signed(151,16),
25. to_signed(914,16),
26. to_signed(642,16),

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ImudgQVw-1591456787164)(media/971d69da99ff43b56739dacf798177d5.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-2kE2NyTg-1591456787166)(media/610088f6fd9c02253929d244c50b4a0f.png)]

This figure is the implementation timing stimulation of direct form. Then by
observing these two figures we can find that the results of the VHDL project
is correct. Although the output curves still have some glitches and
distortion. The filter has removed the high frequency components
successfully.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-J1AuhJZb-1591456787168)(media/926ce28fde906b2cc9e11e681fbb75b8.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-IlG7EmVz-1591456787170)(media/fc728fab4af174cbeece45d5610911ab.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-PqfSVRjK-1591456787172)(media/bdd0ca4f24b656bbf50a0bb13f71dad9.png)]

We can find that the total power of the direct form of S1 is 0.112W. The
static power occupies most of total power to 0.091W. And in the dynamic
power, we can find that the DSP part occupies 0.009W which is the largest in
dynamic power. The reason why DSP occupies so much is that the direct form
need many computations such as multiplier and addition.

Then we can find that when test frequency is 50 MHZ the worst negative slack
(WNS) is 15.185ns. The total negative slack (TNS) is 0 ns.

Then we can compute the theory maximum work frequency by using the formula

𝒇𝒎𝒂𝒙

= 𝟏

𝒕𝒄𝒍𝒌−𝑾𝒐𝒓𝒔𝒕 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝑺𝒍𝒂𝒄𝒌

= 𝟏

𝟐𝟎𝒏𝒔−𝟏𝟓.𝟏𝟖𝟓𝒏𝒔

=207.68MHZ

But in the laboratory we found that the maximum work frequency is only
150MHZ. The reason why the fact maximum work frequency cannot be the theory
maximum work frequency. The reason we guess is that the I/O cause some
delays.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-5gk4tgor-1591456787174)(media/dd55dc376c70feb0aca79f5bb9a656a7.png)]

  1. Transposed form:
1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity tra_fir is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end tra_fir;
13.
14. architecture fir_bhv of tra_fir is
15.
16. type array_1 is array(1 to 19) of signed(27 downto 0);
17. type array_2 is array(0 to 19) of signed(27 downto 0);
18. type array_3 is array(0 to 19) of signed(15 downto 0);
19. signal tap : signed(11 downto 0);
20. signal tap_sum : array_1;
21. signal tap_Delay : array_1;
22. signal tap_mul : array_2;
23. constant coef :array_3 := (to_signed(-285,16),
24. to_signed(-455,16),
25. to_signed(151,16),
26. to_signed(914,16),
27. to_signed(642,16),
28. to_signed(-1226,16),
29. to_signed(-2372,16),
30. to_signed(206,16),
31. to_signed(6511,16),
32. to_signed(12042,16),
33. to_signed(12042,16),
34. to_signed(6511,16),
35. to_signed(206,16),
36. to_signed(-2372,16),
37. to_signed(-1226,16),
38. to_signed(642,16),
39. to_signed(914,16),
40. to_signed(151,16),
41. to_signed(-455,16),
42. to_signed(-285,16));
43. begin
44. proc:process(Clk,Reset)
45. begin
46. if reset=‘1’ then
47. tap <= (others => ‘0’);
48. for i in 1 to 19 loop
49. tap_sum(i) <= (others => ‘0’);
50. tap_Delay(i) <= (others => ‘0’);
51. end loop;
52. elsif CLK’EVENT AND (CLK=‘1’) then
53. tap <= data_In;
54. for i in 0 to 19 loop
55. tap_mul(i) <= tap*coef(i);
56. end loop;
57. for i in 1 to 19 loop
58. tap_delay(i) <= tap_mul(i-1);
59. tap_sum(i) <= tap_Delay(i) + tap_mul(i);
60. end loop;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-O1XaA50q-1591456787175)(media/6d6085c86b8599e69aec43d8638a396e.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-HbFf8RRh-1591456787177)(media/d7b5389347f3d37e9b34f7ed05dac1f5.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-f6kQ9ePp-1591456787179)(media/2c67051ad8ec784a206fa0942c5527af.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-tX9jkHZJ-1591456787180)(media/f140afa9a3e3f0856000bbcd606d4144.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-u5cqmYUy-1591456787181)(media/54ab90fc024fd2cf0e880a88357c7969.png)]

We can find that the total power of the transposed form of S1 is 0.1W. The
static power occupies most of total power to 0.091W. And in the dynamic
power, we can find that the DSP part occupies smaller than 0.001W which has
decreases 0.008W compare to the direct form. The reason why the power DSP
occupies decreases so much is that the transposed form need half
multiplication compare to direct form. And we can find that the I/O power in
dynamic power is also decreases.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-e8HLHYHv-1591456787184)(media/0772c053fc63f0f52cebaf2aa263b223.png)]

Then we can find that when test frequency is 50 MHZ the worst negative slack
(WNS) is 16.219ns. The total negative slack (TNS) is 0 ns. And the worst
hold slack (WHS) is 0.364ns, the total hold slack (THS) is 0 ns. And the
worst pulse width slack (WPWS) is 9.500ns, the total pulse width negative
slack (TPWS) is 0 ns.

Then we can compute the theory maximum work frequency by using the formula

𝒇𝒎𝒂𝒙

= 𝟏

𝒕𝒄𝒍𝒌−𝑾𝒐𝒓𝒔𝒕 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝑺𝒍𝒂𝒄𝒌

= 𝟏

𝟐𝟎𝒏𝒔−𝟏𝟔.𝟐𝟏𝟗𝒏𝒔

= 264.48 MHZ

But in the laboratory we found that the maximum work frequency is only 160
MHZ. The reason why the fact maximum work frequency cannot be the theory
maximum work frequency. The reason we guess is that the I/O cause some
delays.

  1. Exploiting Coefficients:

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-HcDuY3aN-1591456787185)(media/7563e5e70498d7c87067d61339e49efc.jpg)]

The unit impulse response of many filters has obvious symmetry, which can
usually be used to minimize the arithmetic requirements and produce a
region- efficient filter implementation With this type of implementation the
number of

multiplications is reduced to

𝑁+1

. The number of adder will add one.

( )

2

1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity tra_fir is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end tra_fir;
13.
14. architecture co_fir_bhv of co_tra_fir is
15.
16. type array_1 is array(1 to 19) of signed(27 downto 0);
17. type array_2 is array(0 to 19) of signed(27 downto 0);
18. type array_3 is array(0 to 19) of signed(15 downto 0);
19. signal tap : signed(11 downto 0);
20. signal tap_sum : array_1;
21. signal tap_Delay : array_1;
22. signal tap_mul : array_2;
23. constant coef :array_3 := (to_signed(-285,16),
24. to_signed(-455,16),
25. to_signed(151,16),
26. to_signed(914,16),
27. to_signed(642,16),
28. to_signed(-1226,16),
29. to_signed(-2372,16),
30. to_signed(206,16),
31. to_signed(6511,16),
32. to_signed(12042,16),
33. to_signed(12042,16),
34. to_signed(6511,16),
35. to_signed(206,16),
36. to_signed(-2372,16),
37. to_signed(-1226,16),
38. to_signed(642,16),
39. to_signed(914,16),
40. to_signed(151,16),
41. to_signed(-455,16),
42. to_signed(-285,16));
43. begin
44. proc:process(Clk,Reset)
45. begin
46. if reset=‘1’ then
47. tap <= (others => ‘0’);
48. for i in 1 to 19 loop
49. tap_sum(i) <= (others => ‘0’);
50. tap_Delay(i) <= (others => ‘0’);
51. end loop;
52. elsif CLK’EVENT AND (CLK=‘1’) then
53. tap <= data_In;
54. for i in 0 to 19 loop
55. tap_mul(i) <= tap*coef(i);
56. end loop;
57. for i in 1 to 19 loop
58. tap_delay(i) <= tap_mul(i-1);
59. tap_sum(i) <= tap_Delay(i) + tap_mul(i);
60. end loop;
61. end if;
62. end process;
63. data_Out <= std_logic_vector(tap_sum(19)(27 downto 12));
64. end;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-fnZ0CSos-1591456787187)(media/19b666f5716380be1c9cd5e069717a12.png)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-DjzcIxHk-1591456787188)(media/b206340018a8ab914ef91a18e83886f1.jpg)]

We can find that the total power of the exploiting coefficients form of S1
is 0.106W. The static power occupies most of total power to 0.091W. And in
the dynamic power, we can find that the DSP part occupies smaller than
0.001W.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-p5scEO1T-1591456787189)(media/fb8026aadfdc85e4b38aaee53b318182.png)]

Then we can find that when test frequency is 50 MHZ the worst negative slack
(WNS) is 16.813ns. The total negative slack (TNS) is 0 ns. And the worst
hold slack (WHS) is 0.452ns, the total hold slack (THS) is 0 ns. And the
worst pulse width slack (WPWS) is 11.35ns, the total pulse width negative
slack (TPWS) is 0 ns.

Then we can compute the theory maximum work frequency by using the formula

𝒇𝒎𝒂𝒙

= 𝟏

𝒕𝒄𝒍𝒌−𝑾𝒐𝒓𝒔𝒕 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝑺𝒍𝒂𝒄𝒌

= 𝟏

𝟐𝟎𝒏𝒔−𝟏𝟔.𝟖𝟏𝟑𝒏𝒔

= 313.75 MHZ

But in the laboratory we found that the maximum work frequency is only 140
MHZ. The reason why the fact maximum work frequency cannot be the theory
maximum work frequency. The reason we guess is that the I/O cause some
delays.

  1. Shifter and Adder:
1. library IEEE;
2. use IEEE.STD_LOGIC_1164.ALL;
3. use ieee.numeric_std.all;
4.
5. entity shifer_fir is
6. port(
7. clk : in std_logic;
8. reset : in std_logic;
9. data_In : in signed(11 downto 0);
10. data_Out : out std_logic_vector(15 downto 0)
11. );
12. end tra_fir;
13.
14. architecture shifer bhv of shifer fir is
15.
16. type array_1 is array(1 to 19) of signed(27 downto 0);
17. type array_2 is array(0 to 19) of signed(27 downto 0);
18. type array_3 is array(0 to 19) of signed(15 downto 0);
19. signal tap : signed(11 downto 0);
20. signal tap_sum : array_1;
21. signal tap_Delay : array_1;
22. signal tap_mul : array_2;
23. constant coef :array_3 := (to_signed(-285,16),
24. to_signed(-455,16),
25. to_signed(151,16),
26. to_signed(914,16),
27. to_signed(642,16),
28. to_signed(-1226,16),
29. to_signed(-2372,16), 30. to_signed(914,16),
31. to_signed(151,16),
32. to_signed(-455,16),
33. to_signed(-285,16));
34. begin
35. proc:process(Clk,Reset)
36. begin
37. if reset=‘1’ then
38. tap <= (others => ‘0’);
39. for i in 1 to 19 loop
40. tap_sum(i) <= (others => ‘0’);
41. tap_Delay(i) <= (others => ‘0’);
42. end loop;
43. elsif CLK’EVENT AND (CLK=‘1’) then
44. tap <= data_In;
45. for i in 0 to 19 loop
46. tap_mul(i) <= tap*coef(i);
47. end loop;
48. for i in 1 to 19 loop
49. tap_delay(i) <= tap_mul(i-1);
50. tap_sum(i) <= tap_Delay(i) + tap_mul(i);
51. end loop;
52. end if;
53. end process;
54. data_Out <= std_logic_vector(tap_sum(19)(27 downto 12));
55. end;

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-An3q47Ks-1591456787191)(media/16702d756eb9e8847e00c8721f14c676.jpg)]

We can find that the total power of the direct form of S1 is 0.103W. The

static power occupies most of total power to 0.091W. And in the dynamic
power, we can find that the DSP part occupies 0.001W. The reason why DSP
occupies so small is that we use adder to replace multipliers.

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-H98v2x7o-1591456787194)(media/297658156c2594b095614cb31f5e41db.jpg)]

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-GS2QJ5Gx-1591456787195)(media/74db523aa5c0e0d4f9bcdab967910c07.jpg)]

Design Method Total Power on chip Device Static(w) WNS(ns)
Direct Form 0.112 0.091 15.185
Transposed Form 0.109 0.091 16.219
Exploiting Coefficients 0.107 0.091 16.813
Shifter & Reduced Adder 0.103 0.091 15.26

In summary, we can compare the performances of different method to design
filter. The total power on chip of direct form is highest which is caused by
high complexity. The device static power is all 0.091W. we can find that the
two kinds of method based on direct form which one is exploiting
coefficients, and the other is shifter and adder. The shifter has lowest
power which is 0.103W.

After analysis the power of different methods, we will take a look on the
maximum work frequency. The data of each method have been filled in this
table. We can find that the transposed should has highest work maximum
frequency. The direct

form has lowest maximum work frequency. Although in the lab we cannot
achieve the theory maximum frequency. But the value of this table still have
some reference value which can tell some maximum work frequency information
of each method.

𝑓𝑚𝑎𝑥 =

1

𝑡𝑐𝑙𝑘 − 𝑊𝑜𝑟𝑠𝑡 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑆𝑙𝑎𝑐𝑘

Design Method WNS(ns) Maximum Frequency(MHZ)
Direct Form 15.185 207.68
Transposed Form 16.219 264.48
Exploiting Coefficients 16.813 313.75
Shifter & Reduced Adder 15.26 210.97

ransposed should has highest work maximum

frequency. The direct

form has lowest maximum work frequency. Although in the lab we cannot
achieve the theory maximum frequency. But the value of this table still have
some reference value which can tell some maximum work frequency information
of each method.

𝑓𝑚𝑎𝑥 =

1

𝑡𝑐𝑙𝑘 − 𝑊𝑜𝑟𝑠𝑡 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑆𝑙𝑎𝑐𝑘

Design Method WNS(ns) Maximum Frequency(MHZ)
Direct Form 15.185 207.68
Transposed Form 16.219 264.48
Exploiting Coefficients 16.813 313.75
Shifter & Reduced Adder 15.26 210.97
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章