這些函數基本都是支持重入的。
基本每個函數都有四種數據類型，F32，Q31，Q15，Q7。
函數中數值的處理基本都是4個爲一組，這麼做的原因是F32，Q31，Q15，Q7就可以統一採用一個程序設計架構，便於管理。更重要的是可以在Q15和Q7數據處理中很好的發揮SIMD指令的作用（因爲4個爲一組的話，可以用SIMD指令正好處理2個Q15數據或者4個Q7數據）。
部分函數是支持目標指針和源指針指向相同的緩衝區。
爲什麼定點DSP運算輸出的時候容易出現結果爲0的情況：http://www.armbbs.cn/forum.php?mod=viewthread&tid=95194

12.2 DSP基礎運算指令

本章用到基礎運算指令：

相反數函數用到QSUB，QSUB16和QSUB8。
偏移函數用到QADD，QADD16和QADD8。
移位函數用到PKHBT和SSAT。
減法函數用到QSUB，QSUB16和QSUB8。
比例因子函數用到PKHBT和SSAT。

這裏特別注意飽和運算問題，在第11章的第2小節有詳細說明。

12.3 相反數（Vector Negate）

這部分函數主要用於求相反數，公式描述如下：

pDst[n] = -pSrc[n], 0 <= n < blockSize.

特別注意，這部分函數支持目標指針和源指針指向相同的緩衝區。

12.3.1 函數arm_negate_f32

函數原型：

1.    void arm_negate_f32(
2.      const float32_t * pSrc,
3.            float32_t * pDst,
4.            uint32_t blockSize)
5.    {
6.            uint32_t blkCnt;                               /* Loop counter */
7.    
8.    #if defined(ARM_MATH_NEON_EXPERIMENTAL)
9.        float32x4_t vec1;
10.        float32x4_t res;
11.    
12.        /* Compute 4 outputs at a time */
13.        blkCnt = blockSize >> 2U;
14.    
15.        while (blkCnt > 0U)
16.        {
17.            /* C = -A */
18.    
19.            /* Negate and then store the results in the destination buffer. */
20.            vec1 = vld1q_f32(pSrc);
21.            res = vnegq_f32(vec1);
22.            vst1q_f32(pDst, res);
23.    
24.            /* Increment pointers */
25.            pSrc += 4;
26.            pDst += 4;
27.            
28.            /* Decrement the loop counter */
29.            blkCnt--;
30.        }
31.    
32.        /* Tail */
33.        blkCnt = blockSize & 0x3;
34.    
35.    #else
36.    #if defined (ARM_MATH_LOOPUNROLL)
37.    
38.      /* Loop unrolling: Compute 4 outputs at a time */
39.      blkCnt = blockSize >> 2U;
40.    
41.      while (blkCnt > 0U)
42.      {
43.        /* C = -A */
44.    
45.        /* Negate and store result in destination buffer. */
46.        *pDst++ = -*pSrc++;
47.    
48.        *pDst++ = -*pSrc++;
49.    
50.        *pDst++ = -*pSrc++;
51.    
52.        *pDst++ = -*pSrc++;
53.    
54.        /* Decrement loop counter */
55.        blkCnt--;
56.      }
57.    
58.      /* Loop unrolling: Compute remaining outputs */
59.      blkCnt = blockSize % 0x4U;
60.    
61.    #else
62.    
63.      /* Initialize blkCnt with number of samples */
64.      blkCnt = blockSize;
65.    
66.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
67.    #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */
68.    
69.      while (blkCnt > 0U)
70.      {
71.        /* C = -A */
72.    
73.        /* Negate and store result in destination buffer. */
74.        *pDst++ = -*pSrc++;
75.    
76.        /* Decrement loop counter */
77.        blkCnt--;
78.      }
79.    
80.    }

函數描述：

這個函數用於求32位浮點數的相反數。

函數解析：

第8到35行，用於NEON指令集，當前的CM內核不支持。
第36到61行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
浮點數的相反數求解比較簡單，直接在相應的變量前加上負號即可。
第69到78行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是原數據地址。
第2個參數是求相反數後目的數據地址。
第3個參數轉換的數據個數，這裏是指的浮點數個數。

12.3.2 函數arm_negate _q31

函數原型：

1.    void arm_negate_q31(
2.      const q31_t * pSrc,
3.            q31_t * pDst,
4.            uint32_t blockSize)
5.    {
6.            uint32_t blkCnt;                               /* Loop counter */
7.            q31_t in;                                      /* Temporary input variable */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.      /* Loop unrolling: Compute 4 outputs at a time */
12.      blkCnt = blockSize >> 2U;
13.    
14.      while (blkCnt > 0U)
15.      {
16.        /* C = -A */
17.    
18.        /* Negate and store result in destination buffer. */
19.        in = *pSrc++;
20.    #if defined (ARM_MATH_DSP)
21.        *pDst++ = __QSUB(0, in);
22.    #else
23.        *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
24.    #endif
25.    
26.        in = *pSrc++;
27.    #if defined (ARM_MATH_DSP)
28.        *pDst++ = __QSUB(0, in);
29.    #else
30.        *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
31.    #endif
32.    
33.        in = *pSrc++;
34.    #if defined (ARM_MATH_DSP)
35.        *pDst++ = __QSUB(0, in);
36.    #else
37.        *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
38.    #endif
39.    
40.        in = *pSrc++;
41.    #if defined (ARM_MATH_DSP)
42.        *pDst++ = __QSUB(0, in);
43.    #else
44.        *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
45.    #endif
46.    
47.        /* Decrement loop counter */
48.        blkCnt--;
49.      }
50.    
51.      /* Loop unrolling: Compute remaining outputs */
52.      blkCnt = blockSize % 0x4U;
53.    
54.    #else
55.    
56.      /* Initialize blkCnt with number of samples */
57.      blkCnt = blockSize;
58.    
59.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
60.    
61.      while (blkCnt > 0U)
62.      {
63.        /* C = -A */
64.    
65.        /* Negate and store result in destination buffer. */
66.        in = *pSrc++;
67.    #if defined (ARM_MATH_DSP)
68.        *pDst++ = __QSUB(0, in);
69.    #else
70.        *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
71.    #endif
72.    
73.        /* Decrement loop counter */
74.        blkCnt--;
75.      }
76.    
77.    }

函數描述：

用於求32位定點數的相反數。

函數解析：

第9到54行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第61到75行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
對於Q31格式的數據，飽和運算會使得數據0x80000000變成0x7fffffff，因爲最小負數0x80000000（對應浮點數-1），求相反數後，是個正的0x80000000（對應浮點數正1），已經超過Q31所能表示的最大值0x7fffffff，因此會被飽和處理爲正數最大值0x7fffffff。
這裏重點說一下函數__QSUB，其實這個函數算是Cortex-M7，M4/M3的一個指令，用於實現飽和減法。比如函數：__QSUB(0, in1) 的作用就是實現0 – in1並返回結果。這裏__QSUB實現的是32位數的飽和減法。還有__QSUB16和__QSUB8實現的是16位和8位數的減法。

函數參數：

第1個參數是原數據地址。
第2個參數是求相反數後目的數據地址。
第3個參數轉換的數據個數，這裏是指的定點數個數。

12.3.3 函數arm_negate_q15

函數原型：

1.    void arm_negate_q15(
2.      const q15_t * pSrc,
3.            q15_t * pDst,
4.            uint32_t blockSize)
5.    {
6.            uint32_t blkCnt;                               /* Loop counter */
7.            q15_t in;                                      /* Temporary input variable */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.    #if defined (ARM_MATH_DSP)
12.      q31_t in1;                                    /* Temporary input variables */
13.    #endif
14.    
15.      /* Loop unrolling: Compute 4 outputs at a time */
16.      blkCnt = blockSize >> 2U;
17.    
18.      while (blkCnt > 0U)
19.      {
20.        /* C = -A */
21.    
22.    #if defined (ARM_MATH_DSP)
23.        /* Negate and store result in destination buffer (2 samples at a time). */
24.        in1 = read_q15x2_ia ((q15_t **) &pSrc);
25.        write_q15x2_ia (&pDst, __QSUB16(0, in1));
26.    
27.        in1 = read_q15x2_ia ((q15_t **) &pSrc);
28.        write_q15x2_ia (&pDst, __QSUB16(0, in1));
29.    #else
30.        in = *pSrc++;
31.        *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
32.    
33.        in = *pSrc++;
34.        *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
35.    
36.        in = *pSrc++;
37.        *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
38.    
39.        in = *pSrc++;
40.        *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
41.    #endif
42.    
43.        /* Decrement loop counter */
44.        blkCnt--;
45.      }
46.    
47.      /* Loop unrolling: Compute remaining outputs */
48.      blkCnt = blockSize % 0x4U;
49.    
50.    #else
51.    
52.      /* Initialize blkCnt with number of samples */
53.      blkCnt = blockSize;
54.    
55.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
56.    
57.      while (blkCnt > 0U)
58.      {
59.        /* C = -A */
60.    
61.        /* Negate and store result in destination buffer. */
62.        in = *pSrc++;
63.        *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
64.    
65.        /* Decrement loop counter */
66.        blkCnt--;
67.      }
68.    
69.    }

函數描述：

用於求16位定點數的絕對值。

函數解析：

第9到50行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第57到67行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
對於Q15格式的數據，飽和運算會使得數據0x8000求相反數後飽和爲0x7fff。因爲最小負數0x8000（對應浮點數-1），求相反數後，是個正的0x8000（對應浮點數正1），已經超過Q15所能表示的最大值0x7fff，因此會被飽和處理爲正數最大值0x7fff。
__QSUB16用於實現16位數據的飽和減法。

函數參數：

第1個參數是原數據地址。
第2個參數是求相反數後目的數據地址。
第3個參數轉換的數據個數，這裏是指的定點數個數。

12.3.4 函數arm_negate_q7

函數原型：

1.    void arm_negate_q7(
2.      const q7_t * pSrc,
3.            q7_t * pDst,
4.            uint32_t blockSize)
5.    {
6.            uint32_t blkCnt;                               /* Loop counter */
7.            q7_t in;                                       /* Temporary input variable */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.    #if defined (ARM_MATH_DSP)
12.      q31_t in1;                                    /* Temporary input variable */
13.    #endif
14.    
15.      /* Loop unrolling: Compute 4 outputs at a time */
16.      blkCnt = blockSize >> 2U;
17.    
18.      while (blkCnt > 0U)
19.      {
20.        /* C = -A */
21.    
22.    #if defined (ARM_MATH_DSP)
23.        /* Negate and store result in destination buffer (4 samples at a time). */
24.        in1 = read_q7x4_ia ((q7_t **) &pSrc);
25.        write_q7x4_ia (&pDst, __QSUB8(0, in1));
26.    #else
27.        in = *pSrc++;
28.        *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
29.    
30.        in = *pSrc++;
31.        *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
32.    
33.        in = *pSrc++;
34.        *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
35.    
36.        in = *pSrc++;
37.        *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
38.    #endif
39.    
40.        /* Decrement loop counter */
41.        blkCnt--;
42.      }
43.    
44.      /* Loop unrolling: Compute remaining outputs */
45.      blkCnt = blockSize % 0x4U;
46.    
47.    #else
48.    
49.      /* Initialize blkCnt with number of samples */
50.      blkCnt = blockSize;
51.    
52.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
53.    
54.      while (blkCnt > 0U)
55.      {
56.        /* C = -A */
57.    
58.        /* Negate and store result in destination buffer. */
59.        in = *pSrc++;
60.    
61.    #if defined (ARM_MATH_DSP)
62.        *pDst++ = (q7_t) __QSUB(0, in);
63.    #else
64.        *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
65.    #endif
66.    
67.        /* Decrement loop counter */
68.        blkCnt--;
69.      }
70.    
71.    }

函數描述：

用於求8位定點數的相反數。

函數解析：

第9到47行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第54到69行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
對於Q7格式的數據，飽和運算會使得數據0x80變成0x7f。因爲最小負數0x80（對應浮點數-1），求相反數後，是個正的0x80（對應浮點數正1），已經超過Q7所能表示的最大值0x7f，因此會被飽和處理爲正數最大值0x7f。
__QSUB8用於實現8位數據的飽和減法。

函數參數：

第1個參數是原數據地址。
第2個參數是求相反數後目的數據地址。
第3個參數轉換的數據個數，這裏是指的定點數個數。

12.3.5 使用舉例

程序設計：

/*
*********************************************************************************************************
*    函 數 名: DSP_Negate
*    功能說明: 求相反數
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void DSP_Negate(void)
{
     float32_t pSrc = 0.0f;
     float32_t pDst;
    
    q31_t pSrc1 = 0;
    q31_t pDst1;
    
    q15_t pSrc2 = 0;
    q15_t pDst2;
    
    q7_t pSrc3 = 0; 
    q7_t pDst3;
    
    /*求相反數*********************************/    
    pSrc -= 1.23f;
    arm_negate_f32(&pSrc, &pDst, 1);
    printf("arm_negate_f32 = %f\r\n", pDst);

    pSrc1 -= 1;
    arm_negate_q31(&pSrc1, &pDst1, 1);
    printf("arm_negate_q31 = %d\r\n", pDst1);

    pSrc2 -= 1;
    arm_negate_q15(&pSrc2, &pDst2, 1);
    printf("arm_negate_q15 = %d\r\n", pDst2);

    pSrc3 += 1; 
    arm_negate_q7(&pSrc3, &pDst3, 1);
    printf("arm_negate_q7 = %d\r\n", pDst3);
    printf("***********************************\r\n");
}

實驗現象：

12.4 偏移（Vector Offset）

這部分函數主要用於求偏移，公式描述如下：

pDst[n] = pSrc[n] + offset, 0 <= n < blockSize.

注意，這部分函數支持目標指針和源指針指向相同的緩衝區。

12.4.1 函數arm_offset_f32

函數原型：

1.    void arm_offset_f32(
2.      const float32_t * pSrc,
3.            float32_t offset,
4.            float32_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined(ARM_MATH_NEON_EXPERIMENTAL)
10.        float32x4_t vec1;
11.        float32x4_t res;
12.    
13.        /* Compute 4 outputs at a time */
14.        blkCnt = blockSize >> 2U;
15.    
16.        while (blkCnt > 0U)
17.        {
18.            /* C = A + offset */
19.     
20.            /* Add offset and then store the results in the destination buffer. */
21.            vec1 = vld1q_f32(pSrc);
22.            res = vaddq_f32(vec1,vdupq_n_f32(offset));
23.            vst1q_f32(pDst, res);
24.    
25.            /* Increment pointers */
26.            pSrc += 4;
27.            pDst += 4;
28.            
29.            /* Decrement the loop counter */
30.            blkCnt--;
31.        }
32.    
33.        /* Tail */
34.        blkCnt = blockSize & 0x3;
35.    
36.    #else
37.    #if defined (ARM_MATH_LOOPUNROLL)
38.    
39.      /* Loop unrolling: Compute 4 outputs at a time */
40.      blkCnt = blockSize >> 2U;
41.    
42.      while (blkCnt > 0U)
43.      {
44.        /* C = A + offset */
45.    
46.        /* Add offset and store result in destination buffer. */
47.        *pDst++ = (*pSrc++) + offset;
48.    
49.        *pDst++ = (*pSrc++) + offset;
50.    
51.        *pDst++ = (*pSrc++) + offset;
52.    
53.        *pDst++ = (*pSrc++) + offset;
54.    
55.        /* Decrement loop counter */
56.        blkCnt--;
57.      }
58.    
59.      /* Loop unrolling: Compute remaining outputs */
60.      blkCnt = blockSize % 0x4U;
61.    
62.    #else
63.    
64.      /* Initialize blkCnt with number of samples */
65.      blkCnt = blockSize;
66.    
67.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
68.    #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */
69.    
70.      while (blkCnt > 0U)
71.      {
72.        /* C = A + offset */
73.    
74.        /* Add offset and store result in destination buffer. */
75.        *pDst++ = (*pSrc++) + offset;
76.    
77.        /* Decrement loop counter */
78.        blkCnt--;
79.      }
80.    
81.    }

函數描述：

這個函數用於求32位浮點數的偏移。

函數解析：

第9到36行，用於NEON指令集，當前的CM內核不支持。
第37到62行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第70到79行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是源數據地址。
第2個參數是偏移量。
第3個參數是轉換後的目的地址。
第4個參數是浮點數個數，其實就是執行偏移的次數。

12.4.2 函數arm_offset_q31

函數原型：

1.    void arm_offset_q31(
2.      const q31_t * pSrc,
3.            q31_t offset,
4.            q31_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.      /* Loop unrolling: Compute 4 outputs at a time */
12.      blkCnt = blockSize >> 2U;
13.    
14.      while (blkCnt > 0U)
15.      {
16.        /* C = A + offset */
17.    
18.        /* Add offset and store result in destination buffer. */
19.    #if defined (ARM_MATH_DSP)
20.        *pDst++ = __QADD(*pSrc++, offset);
21.    #else
22.        *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
23.    #endif
24.    
25.    #if defined (ARM_MATH_DSP)
26.        *pDst++ = __QADD(*pSrc++, offset);
27.    #else
28.        *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
29.    #endif
30.    
31.    #if defined (ARM_MATH_DSP)
32.        *pDst++ = __QADD(*pSrc++, offset);
33.    #else
34.        *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
35.    #endif
36.    
37.    #if defined (ARM_MATH_DSP)
38.        *pDst++ = __QADD(*pSrc++, offset);
39.    #else
40.        *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
41.    #endif
42.    
43.        /* Decrement loop counter */
44.        blkCnt--;
45.      }
46.    
47.      /* Loop unrolling: Compute remaining outputs */
48.      blkCnt = blockSize % 0x4U;
49.    
50.    #else
51.    
52.      /* Initialize blkCnt with number of samples */
53.      blkCnt = blockSize;
54.    
55.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
56.    
57.      while (blkCnt > 0U)
58.      {
59.        /* C = A + offset */
60.    
61.        /* Add offset and store result in destination buffer. */
62.    #if defined (ARM_MATH_DSP)
63.        *pDst++ = __QADD(*pSrc++, offset);
64.    #else
65.        *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
66.    #endif
67.    
68.        /* Decrement loop counter */
69.        blkCnt--;
70.      }
71.    
72.    }

函數描述：

這個函數用於求兩個32位定點數的偏移。

函數解析：

第9到50行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第57到70行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
__QADD實現32位數的加法飽和運算。輸出結果的範圍[0x80000000 0x7FFFFFFF]，超出這個結果將產生飽和結果，負數飽和到0x80000000，正數飽和到0x7FFFFFFF。

函數參數：

第1個參數是源數據地址。
第2個參數是偏移量。
第3個參數是轉換後的目的地址。
第4個參數是定點數個數，其實就是執行偏移的次數。

12.4.3 函數arm_offset_q15

函數原型：

1.    void arm_offset_q15(
2.      const q15_t * pSrc,
3.            q15_t offset,
4.            q15_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.    #if defined (ARM_MATH_DSP)
12.      q31_t offset_packed;                           /* Offset packed to 32 bit */
13.    
14.      /* Offset is packed to 32 bit in order to use SIMD32 for addition */
15.      offset_packed = __PKHBT(offset, offset, 16);
16.    #endif
17.    
18.      /* Loop unrolling: Compute 4 outputs at a time */
19.      blkCnt = blockSize >> 2U;
20.    
21.      while (blkCnt > 0U)
22.      {
23.        /* C = A + offset */
24.    
25.    #if defined (ARM_MATH_DSP)
26.        /* Add offset and store result in destination buffer (2 samples at a time). */
27.        write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));
28.        write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));
29.    #else
30.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
31.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
32.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
33.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
34.    #endif
35.    
36.        /* Decrement loop counter */
37.        blkCnt--;
38.      }
39.    
40.      /* Loop unrolling: Compute remaining outputs */
41.      blkCnt = blockSize % 0x4U;
42.    
43.    #else
44.    
45.      /* Initialize blkCnt with number of samples */
46.      blkCnt = blockSize;
47.    
48.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
49.    
50.      while (blkCnt > 0U)
51.      {
52.        /* C = A + offset */
53.    
54.        /* Add offset and store result in destination buffer. */
55.    #if defined (ARM_MATH_DSP)
56.        *pDst++ = (q15_t) __QADD16(*pSrc++, offset);
57.    #else
58.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
59.    #endif
60.    
61.        /* Decrement loop counter */
62.        blkCnt--;
63.      }
64.    
65.    }

函數描述：

這個函數用於求16位定點數的偏移。

函數解析：

第9到43行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第50到63行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
函數__PKHBT也是SIMD指令，作用是將將兩個16位的數據合併成32位數據。用C實現的話，如下：

  #define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) <<    0) & (int32_t)0x0000FFFF) | \
                                      (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000)  )

函數read_q15x2_ia的原型如下：

__STATIC_FORCEINLINE q31_t read_q15x2_ia (
  q15_t ** pQ15)
{
  q31_t val;

  memcpy (&val, *pQ15, 4);
  *pQ15 += 2;

  return (val);
}

作用是讀取兩次16位數據，返回一個32位數據，並將數據地址遞增，方便下次讀取。

__QADD16實現兩次16位數的加法飽和運算。輸出結果的範圍[0x8000 0x7FFF]，超出這個結果將產生飽和結果，負數飽和到0x8000，正數飽和到0x7FFF。
__SSAT也是SIMD指令，這裏是將結果飽和到16位精度。

函數參數：

第1個參數是源數據地址。
第2個參數是偏移量。
第3個參數是轉換後的目的地址。
第4個參數是定點數個數，其實就是執行偏移的次數。

12.4.4 函數arm_offset_q7

函數原型：

1.    void arm_offset_q7(
2.      const q7_t * pSrc,
3.            q7_t offset,
4.            q7_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.    #if defined (ARM_MATH_DSP)
12.      q31_t offset_packed;                           /* Offset packed to 32 bit */
13.    
14.      /* Offset is packed to 32 bit in order to use SIMD32 for addition */
15.      offset_packed = __PACKq7(offset, offset, offset, offset);
16.    #endif
17.    
18.      /* Loop unrolling: Compute 4 outputs at a time */
19.      blkCnt = blockSize >> 2U;
20.    
21.      while (blkCnt > 0U)
22.      {
23.        /* C = A + offset */
24.    
25.    #if defined (ARM_MATH_DSP)
26.        /* Add offset and store result in destination buffer (4 samples at a time). */
27.        write_q7x4_ia (&pDst, __QADD8(read_q7x4_ia ((q7_t **) &pSrc), offset_packed));
28.    #else
29.        *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);
30.        *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);
31.        *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);
32.        *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);
33.    #endif
34.    
35.        /* Decrement loop counter */
36.        blkCnt--;
37.      }
38.    
39.      /* Loop unrolling: Compute remaining outputs */
40.      blkCnt = blockSize % 0x4U;
41.    
42.    #else
43.    
44.      /* Initialize blkCnt with number of samples */
45.      blkCnt = blockSize;
46.    
47.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
48.    
49.      while (blkCnt > 0U)
50.      {
51.        /* C = A + offset */
52.    
53.        /* Add offset and store result in destination buffer. */
54.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrc++ + offset, 8);
55.    
56.        /* Decrement loop counter */
57.        blkCnt--;
58.      }
59.    
60.    }

函數描述：

這個函數用於求兩個8位定點數的偏移。

函數解析：

第9到42行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第49到58行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
函數write_q7x4_ia的原型如下：

__STATIC_FORCEINLINE void write_q7x4_ia (
  q7_t ** pQ7,
  q31_t   value)
{
  q31_t val = value;

  memcpy (*pQ7, &val, 4);
  *pQ7 += 4;
}

作用是寫4次8位數據，並將數據地址遞增，方便下次繼續寫。

__QADD8實現四次8位數的加法飽和運算。輸出結果的範圍[0x80 0x7F]，超出這個結果將產生飽和結果，負數飽和到0x80，正數飽和到0x7F。

函數參數：

第1個參數是源數據地址。
第2個參數是偏移量。
第3個參數是轉換後的目的地址。
第4個參數是定點數個數，其實就是執行偏移的次數。

12.4.5 使用舉例

程序設計：

/*
*********************************************************************************************************
*    函 數 名: DSP_Offset
*    功能說明: 偏移
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void DSP_Offset(void)
{
    float32_t   pSrcA = 0.0f;
    float32_t   Offset = 0.0f;  
    float32_t   pDst;  
    
    q31_t  pSrcA1 = 0;  
    q31_t  Offset1 = 0;  
    q31_t  pDst1;  

    q15_t  pSrcA2 = 0;  
    q15_t  Offset2 = 0;  
    q15_t  pDst2; 

    q7_t  pSrcA3 = 0; 
    q7_t  Offset3 = 0;  
    q7_t  pDst3;  

    /*求偏移*********************************/        
    Offset--;
    arm_offset_f32(&pSrcA, Offset, &pDst, 1);
    printf("arm_offset_f32 = %f\r\n", pDst);

    Offset1--;
    arm_offset_q31(&pSrcA1, Offset1, &pDst1, 1);
    printf("arm_offset_q31 = %d\r\n", pDst1);

    Offset2--;
    arm_offset_q15(&pSrcA2, Offset2, &pDst2, 1);
    printf("arm_offset_q15 = %d\r\n", pDst2);

    Offset3--;
    arm_offset_q7(&pSrcA3, Offset3, &pDst3, 1);
    printf("arm_offset_q7 = %d\r\n", pDst3);
    printf("***********************************\r\n");
}

實驗現象：

12.5 移位（Vector Shift）

這部分函數主要用於實現移位，公式描述如下：

pDst[n] = pSrc[n] << shift, 0 <= n < blockSize.

注意，這部分函數支持目標指針和源指針指向相同的緩衝區

12.5.1 函數arm_shift_q31

函數原型：

1.    void arm_shift_q31(
2.      const q31_t * pSrc,
3.            int8_t shiftBits,
4.            q31_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.            uint8_t sign = (shiftBits & 0x80);             /* Sign of shiftBits */
9.    
10.    #if defined (ARM_MATH_LOOPUNROLL)
11.    
12.      q31_t in, out;                                 /* Temporary variables */
13.    
14.      /* Loop unrolling: Compute 4 outputs at a time */
15.      blkCnt = blockSize >> 2U;
16.    
17.      /* If the shift value is positive then do right shift else left shift */
18.      if (sign == 0U)
19.      {
20.        while (blkCnt > 0U)
21.        {
22.          /* C = A << shiftBits */
23.    
24.          /* Shift input and store result in destination buffer. */
25.          in = *pSrc++;
26.          out = in << shiftBits;
27.          if (in != (out >> shiftBits))
28.            out = 0x7FFFFFFF ^ (in >> 31);
29.          *pDst++ = out;
30.    
31.          in = *pSrc++;
32.          out = in << shiftBits;
33.          if (in != (out >> shiftBits))
34.            out = 0x7FFFFFFF ^ (in >> 31);
35.          *pDst++ = out;
36.    
37.          in = *pSrc++;
38.          out = in << shiftBits;
39.          if (in != (out >> shiftBits))
40.            out = 0x7FFFFFFF ^ (in >> 31);
41.          *pDst++ = out;
42.    
43.          in = *pSrc++;
44.          out = in << shiftBits;
45.          if (in != (out >> shiftBits))
46.            out = 0x7FFFFFFF ^ (in >> 31);
47.          *pDst++ = out;
48.    
49.          /* Decrement loop counter */
50.          blkCnt--;
51.        }
52.      }
53.      else
54.      {
55.        while (blkCnt > 0U)
56.        {
57.          /* C = A >> shiftBits */
58.    
59.          /* Shift input and store results in destination buffer. */
60.          *pDst++ = (*pSrc++ >> -shiftBits);
61.          *pDst++ = (*pSrc++ >> -shiftBits);
62.          *pDst++ = (*pSrc++ >> -shiftBits);
63.          *pDst++ = (*pSrc++ >> -shiftBits);
64.    
65.          /* Decrement loop counter */
66.          blkCnt--;
67.        }
68.      }
69.    
70.      /* Loop unrolling: Compute remaining outputs */
71.      blkCnt = blockSize % 0x4U;
72.    
73.    #else
74.    
75.      /* Initialize blkCnt with number of samples */
76.      blkCnt = blockSize;
77.    
78.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
79.    
80.      /* If the shift value is positive then do right shift else left shift */
81.      if (sign == 0U)
82.      {
83.        while (blkCnt > 0U)
84.        {
85.          /* C = A << shiftBits */
86.    
87.          /* Shift input and store result in destination buffer. */
88.          *pDst++ = clip_q63_to_q31((q63_t) *pSrc++ << shiftBits);
89.    
90.          /* Decrement loop counter */
91.          blkCnt--;
92.        }
93.      }
94.      else
95.      {
96.        while (blkCnt > 0U)
97.        {
98.          /* C = A >> shiftBits */
99.    
100.          /* Shift input and store result in destination buffer. */
101.          *pDst++ = (*pSrc++ >> -shiftBits);
102.    
103.          /* Decrement loop counter */
104.          blkCnt--;
105.        }
106.      }
107.    
108.    }

函數描述：

這個函數用於求32位定點數的左移或者右移。

函數解析：

第10到73行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第18到52行，如果參數shiftBits是正數，執行左移。
- 第53到68行，如果蠶食shiftBits是負數，執行右移。
- 第28行，數值的左移僅支持將其左移後再右移相應的位數後數值不變的情況，如果不滿足這個條件，那麼要對輸出結果做飽和運算，這裏分兩種情況：

out = 0x7FFFFFFF ^ (in >> 31) （in是正數）

= 0x7FFFFFFF ^ 0x00000000

= 0x7FFFFFFF

out = 0x7FFFFFFF ^ (in >> 31) （in是負數）

= 0x7FFFFFFF ^ 0xFFFFFFFF

= 0x80000000

第81到106行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。
- 第88行，函數clip_q63_to_q31的原型如下：

 __STATIC_FORCEINLINE q31_t clip_q63_to_q31(
  q63_t x)
  {
    return ((q31_t) (x >> 32) != ((q31_t) x >> 31)) ?
      ((0x7FFFFFFF ^ ((q31_t) (x >> 63)))) : (q31_t) x;
  }

函數參數：

第1個參數是源數據地址。
第2個參數是左移或者右移位數，正數是左移，負數是右移。
第3個參數是移位後數據地址。
第4個參數是定點數個數，其實就是執行左移或者右移的次數。

12.5.2 函數arm_shift_q15

函數原型：

1.    void arm_shift_q15(
2.      const q15_t * pSrc,
3.            int8_t shiftBits,
4.            q15_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.            uint8_t sign = (shiftBits & 0x80);             /* Sign of shiftBits */
9.    
10.    #if defined (ARM_MATH_LOOPUNROLL)
11.    
12.    #if defined (ARM_MATH_DSP)
13.      q15_t in1, in2;                                /* Temporary input variables */
14.    #endif
15.    
16.      /* Loop unrolling: Compute 4 outputs at a time */
17.      blkCnt = blockSize >> 2U;
18.    
19.      /* If the shift value is positive then do right shift else left shift */
20.      if (sign == 0U)
21.      {
22.        while (blkCnt > 0U)
23.        {
24.          /* C = A << shiftBits */
25.    
26.    #if defined (ARM_MATH_DSP)
27.          /* read 2 samples from source */
28.          in1 = *pSrc++;
29.          in2 = *pSrc++;
30.    
31.          /* Shift the inputs and then store the results in the destination buffer. */
32.    #ifndef ARM_MATH_BIG_ENDIAN
33.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
34.                                         __SSAT((in2 << shiftBits), 16), 16));
35.    #else
36.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
37.                                          __SSAT((in1 << shiftBits), 16), 16));
38.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
39.    
40.          /* read 2 samples from source */
41.          in1 = *pSrc++;
42.          in2 = *pSrc++;
43.    
44.    #ifndef ARM_MATH_BIG_ENDIAN
45.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
46.                                         __SSAT((in2 << shiftBits), 16), 16));
47.    #else
48.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
49.                                         __SSAT((in1 << shiftBits), 16), 16));
50.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
51.    
52.    #else
53.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
54.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
55.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
56.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
57.    #endif
58.    
59.          /* Decrement loop counter */
60.          blkCnt--;
61.        }
62.      }
63.      else
64.      {
65.        while (blkCnt > 0U)
66.        {
67.          /* C = A >> shiftBits */
68.    
69.    #if defined (ARM_MATH_DSP)
70.          /* read 2 samples from source */
71.          in1 = *pSrc++;
72.          in2 = *pSrc++;
73.    
74.          /* Shift the inputs and then store the results in the destination buffer. */
75.    #ifndef ARM_MATH_BIG_ENDIAN
76.          write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
77.                                         (in2 >> -shiftBits), 16));
78.    #else
79.          write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
80.                                         (in1 >> -shiftBits), 16));
81.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
82.    
83.          /* read 2 samples from source */
84.          in1 = *pSrc++;
85.          in2 = *pSrc++;
86.    
87.    #ifndef ARM_MATH_BIG_ENDIAN
88.          write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
89.                                         (in2 >> -shiftBits), 16));
90.    #else
91.          write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
92.                                         (in1 >> -shiftBits), 16));
93.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
94.    
95.    #else
96.          *pDst++ = (*pSrc++ >> -shiftBits);
97.          *pDst++ = (*pSrc++ >> -shiftBits);
98.          *pDst++ = (*pSrc++ >> -shiftBits);
99.          *pDst++ = (*pSrc++ >> -shiftBits);
100.    #endif
101.    
102.          /* Decrement loop counter */
103.          blkCnt--;
104.        }
105.      }
106.    
107.      /* Loop unrolling: Compute remaining outputs */
108.      blkCnt = blockSize % 0x4U;
109.    
110.    #else
111.    
112.      /* Initialize blkCnt with number of samples */
113.      blkCnt = blockSize;
114.    
115.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
116.    
117.      /* If the shift value is positive then do right shift else left shift */
118.      if (sign == 0U)
119.      {
120.        while (blkCnt > 0U)
121.        {
122.          /* C = A << shiftBits */
123.    
124.          /* Shift input and store result in destination buffer. */
125.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
126.    
127.          /* Decrement loop counter */
128.          blkCnt--;
129.        }
130.      }
131.      else
132.      {
133.        while (blkCnt > 0U)
134.        {
135.          /* C = A >> shiftBits */
136.    
137.          /* Shift input and store result in destination buffer. */
138.          *pDst++ = (*pSrc++ >> -shiftBits);
139.    
140.          /* Decrement loop counter */
141.          blkCnt--;
142.        }
143.      }
144.    
145.    }

函數描述：

這個函數用於求16位定點數的左移或者右移。

函數解析：

第10到115行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第20到62行，如果參數shiftBits是正數，執行左移。
- 第63到105行，如果蠶食shiftBits是負數，執行右移。
- 第79行，函數write_q15x2_ia的原型如下，用於實現將兩個Q15組成合併成一個Q31。

__STATIC_FORCEINLINE void write_q15x2_ia (
  q15_t ** pQ15,
  q31_t    value)
{
  q31_t val = value;

  memcpy (*pQ15, &val, 4);
  *pQ15 += 2;
}

函數__PKHBT也是SIMD指令，作用是將將兩個16位的數據合併成32位數據。用C實現的話，如下：

#define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) <<    0) & (int32_t)0x0000FFFF) | \
                                      (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000)  )

第118到143行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是源數據地址。
第2個參數是左移或者右移位數，正數是左移，負數是右移。
第3個參數是移位後數據地址。
第4個參數是定點數個數，其實就是執行左移或者右移的次數。

12.5.3 函數arm_shift_q7

函數原型：

1.    void arm_shift_q7(
2.      const q7_t * pSrc,
3.            int8_t shiftBits,
4.            q7_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.            uint8_t sign = (shiftBits & 0x80);             /* Sign of shiftBits */
9.    
10.    #if defined (ARM_MATH_LOOPUNROLL)
11.    
12.    #if defined (ARM_MATH_DSP)
13.      q7_t in1,  in2,  in3,  in4;                    /* Temporary input variables */
14.    #endif
15.    
16.      /* Loop unrolling: Compute 4 outputs at a time */
17.      blkCnt = blockSize >> 2U;
18.    
19.      /* If the shift value is positive then do right shift else left shift */
20.      if (sign == 0U)
21.      {
22.        while (blkCnt > 0U)
23.        {
24.          /* C = A << shiftBits */
25.    
26.    #if defined (ARM_MATH_DSP)
27.          /* Read 4 inputs */
28.          in1 = *pSrc++;
29.          in2 = *pSrc++;
30.          in3 = *pSrc++;
31.          in4 = *pSrc++;
32.    
33.        /* Pack and store result in destination buffer (in single write) */
34.          write_q7x4_ia (&pDst, __PACKq7(__SSAT((in1 << shiftBits), 8),
35.                                         __SSAT((in2 << shiftBits), 8),
36.                                         __SSAT((in3 << shiftBits), 8),
37.                                         __SSAT((in4 << shiftBits), 8) ));
38.    #else
39.          *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
40.          *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
41.          *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
42.          *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
43.    #endif
44.    
45.          /* Decrement loop counter */
46.          blkCnt--;
47.        }
48.      }
49.      else
50.      {
51.        while (blkCnt > 0U)
52.        {
53.          /* C = A >> shiftBits */
54.    
55.    #if defined (ARM_MATH_DSP)
56.          /* Read 4 inputs */
57.          in1 = *pSrc++;
58.          in2 = *pSrc++;
59.          in3 = *pSrc++;
60.          in4 = *pSrc++;
61.    
62.        /* Pack and store result in destination buffer (in single write) */
63.          write_q7x4_ia (&pDst, __PACKq7((in1 >> -shiftBits),
64.                                         (in2 >> -shiftBits),
65.                                         (in3 >> -shiftBits),
66.                                         (in4 >> -shiftBits) ));
67.    #else
68.          *pDst++ = (*pSrc++ >> -shiftBits);
69.          *pDst++ = (*pSrc++ >> -shiftBits);
70.          *pDst++ = (*pSrc++ >> -shiftBits);
71.          *pDst++ = (*pSrc++ >> -shiftBits);
72.    #endif
73.    
74.          /* Decrement loop counter */
75.          blkCnt--;
76.        }
77.      }
78.    
79.      /* Loop unrolling: Compute remaining outputs */
80.      blkCnt = blockSize % 0x4U;
81.    
82.    #else
83.    
84.      /* Initialize blkCnt with number of samples */
85.      blkCnt = blockSize;
86.    
87.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
88.    
89.      /* If the shift value is positive then do right shift else left shift */
90.      if (sign == 0U)
91.      {
92.        while (blkCnt > 0U)
93.        {
94.          /* C = A << shiftBits */
95.    
96.          /* Shift input and store result in destination buffer. */
97.          *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
98.    
99.          /* Decrement loop counter */
100.          blkCnt--;
101.        }
102.      }
103.      else
104.      {
105.        while (blkCnt > 0U)
106.        {
107.          /* C = A >> shiftBits */
108.    
109.          /* Shift input and store result in destination buffer. */
110.          *pDst++ = (*pSrc++ >> -shiftBits);
111.    
112.          /* Decrement loop counter */
113.          blkCnt--;
114.        }
115.      }
116.    
117.    }

函數描述：

這個函數用於求8位定點數的左移或者右移。

函數解析：

第10到87行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第20到48行，如果參數shiftBits是正數，執行左移。
- 第49到77行，如果蠶食shiftBits是負數，執行右移。
- 第79行，函數write_q7x4_ia的原型如下，作用是寫入4次8位數據，並將數據地址遞增，方便下次寫入。

__STATIC_FORCEINLINE void write_q7x4_ia (
  q7_t ** pQ7,
  q31_t   value)
{
  q31_t val = value;

  memcpy (*pQ7, &val, 4);
  *pQ7 += 4;
}

函數__PACKq7作用是將將4個8位的數據合併成32位數據，實現代碼如下：

 #define __PACKq7(v0,v1,v2,v3) ( (((int32_t)(v0) <<  0) & (int32_t)0x000000FF) | \
                                  (((int32_t)(v1) <<  8) & (int32_t)0x0000FF00) | \
                                  (((int32_t)(v2) << 16) & (int32_t)0x00FF0000) | \
                                  (((int32_t)(v3) << 24) & (int32_t)0xFF000000)  )

第90到115行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是源數據地址。
第2個參數是左移或者右移位數，正數是左移，負數是右移。
第3個參數是移位後數據地址。
第4個參數是定點數個數，其實就是執行左移或者右移的次數

12.5.4 使用舉例

程序設計：

/*
*********************************************************************************************************
*    函 數 名: DSP_Shift
*    功能說明: 移位
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void DSP_Shift(void)
{
    q31_t  pSrcA1 = 0x88886666;  
    q31_t  pDst1;  

    q15_t  pSrcA2 = 0x8866;  
    q15_t  pDst2; 

    q7_t  pSrcA3 = 0x86; 
    q7_t  pDst3;  


    /*求移位*********************************/    
    arm_shift_q31(&pSrcA1, 3, &pDst1, 1);
    printf("arm_shift_q31 = %8x\r\n", pDst1);

    arm_shift_q15(&pSrcA2, -3, &pDst2, 1);
    printf("arm_shift_q15 = %4x\r\n", pDst2);

    arm_shift_q7(&pSrcA3, 3, &pDst3, 1);
    printf("arm_shift_q7 = %2x\r\n", pDst3);
    printf("***********************************\r\n");
}

實驗現象：

這裏特別注意Q31和Q7的計算結果，表示負數已經飽和到了最小值。另外注意，對於負數來說，右移時，右側補1，左移時，左側補0。

12.6 減法（Vector Sub）

這部分函數主要用於實現減法，公式描述如下：

pDst[n] = pSrcA[n] - pSrcB[n], 0 <= n < blockSize。

12.6.1 函數arm_sub_f32

函數原型：

1.    void arm_sub_f32(
2.      const float32_t * pSrcA,
3.      const float32_t * pSrcB,
4.            float32_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined(ARM_MATH_NEON)
10.        float32x4_t vec1;
11.        float32x4_t vec2;
12.        float32x4_t res;
13.    
14.        /* Compute 4 outputs at a time */
15.        blkCnt = blockSize >> 2U;
16.    
17.        while (blkCnt > 0U)
18.        {
19.            /* C = A - B */
20.    
21.            /* Subtract and then store the results in the destination buffer. */
22.            vec1 = vld1q_f32(pSrcA);
23.            vec2 = vld1q_f32(pSrcB);
24.            res = vsubq_f32(vec1, vec2);
25.            vst1q_f32(pDst, res);
26.    
27.            /* Increment pointers */
28.            pSrcA += 4;
29.            pSrcB += 4; 
30.            pDst += 4;
31.            
32.            /* Decrement the loop counter */
33.            blkCnt--;
34.        }
35.    
36.        /* Tail */
37.        blkCnt = blockSize & 0x3;
38.    
39.    #else
40.    #if defined (ARM_MATH_LOOPUNROLL)
41.    
42.      /* Loop unrolling: Compute 4 outputs at a time */
43.      blkCnt = blockSize >> 2U;
44.    
45.      while (blkCnt > 0U)
46.      {
47.        /* C = A - B */
48.    
49.        /* Subtract and store result in destination buffer. */
50.        *pDst++ = (*pSrcA++) - (*pSrcB++);
51.    
52.        *pDst++ = (*pSrcA++) - (*pSrcB++);
53.    
54.        *pDst++ = (*pSrcA++) - (*pSrcB++);
55.    
56.        *pDst++ = (*pSrcA++) - (*pSrcB++);
57.    
58.        /* Decrement loop counter */
59.        blkCnt--;
60.      }
61.    
62.      /* Loop unrolling: Compute remaining outputs */
63.      blkCnt = blockSize % 0x4U;
64.    
65.    #else
66.    
67.      /* Initialize blkCnt with number of samples */
68.      blkCnt = blockSize;
69.    
70.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
71.    #endif /* #if defined(ARM_MATH_NEON) */
72.    
73.      while (blkCnt > 0U)
74.      {
75.        /* C = A - B */
76.    
77.        /* Subtract and store result in destination buffer. */
78.        *pDst++ = (*pSrcA++) - (*pSrcB++);
79.    
80.        /* Decrement loop counter */
81.        blkCnt--;
82.      }
83.    
84.    }

函數描述：

這個函數用於求32位浮點數的減法。

函數解析：

第9到39行，用於NEON指令集，當前的CM內核不支持。
第40到65行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第73到82行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是減數地址。
第2個參數是被減數地址。
第3個參數是結果地址。
第4個參數是數據塊大小，其實就是執行減法的次數。

12.6.2 函數arm_sub_q31

函數原型：

1.    void arm_sub_q31(
2.      const q31_t * pSrcA,
3.      const q31_t * pSrcB,
4.            q31_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.      /* Loop unrolling: Compute 4 outputs at a time */
12.      blkCnt = blockSize >> 2U;
13.    
14.      while (blkCnt > 0U)
15.      {
16.        /* C = A - B */
17.    
18.        /* Subtract and store result in destination buffer. */
19.        *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
20.    
21.        *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
22.    
23.        *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
24.    
25.        *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
26.    
27.        /* Decrement loop counter */
28.        blkCnt--;
29.      }
30.    
31.      /* Loop unrolling: Compute remaining outputs */
32.      blkCnt = blockSize % 0x4U;
33.    
34.    #else
35.    
36.      /* Initialize blkCnt with number of samples */
37.      blkCnt = blockSize;
38.    
39.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
40.    
41.      while (blkCnt > 0U)
42.      {
43.        /* C = A - B */
44.    
45.        /* Subtract and store result in destination buffer. */
46.        *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
47.    
48.        /* Decrement loop counter */
49.        blkCnt--;
50.      }
51.    
52.    }

函數描述：

這個函數用於求32位定點數的減法。

函數解析：

這個函數使用了飽和減法__QSUB，所得結果是Q31格式，範圍[0x80000000 0x7FFFFFFF]。
第9到34行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第41到50行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是減數地址。
第2個參數是被減數地址。
第3個參數是結果地址。
第4個參數是數據塊大小，其實就是執行減法的次數。

12.6.3 函數arm_sub_q15

函數原型：

1.    void arm_sub_q15(
2.      const q15_t * pSrcA,
3.      const q15_t * pSrcB,
4.            q15_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.    #if defined (ARM_MATH_DSP)
12.      q31_t inA1, inA2;
13.      q31_t inB1, inB2;
14.    #endif
15.    
16.      /* Loop unrolling: Compute 4 outputs at a time */
17.      blkCnt = blockSize >> 2U;
18.    
19.      while (blkCnt > 0U)
20.      {
21.        /* C = A - B */
22.    
23.    #if defined (ARM_MATH_DSP)
24.        /* read 2 times 2 samples at a time from sourceA */
25.        inA1 = read_q15x2_ia ((q15_t **) &pSrcA);
26.        inA2 = read_q15x2_ia ((q15_t **) &pSrcA);
27.        /* read 2 times 2 samples at a time from sourceB */
28.        inB1 = read_q15x2_ia ((q15_t **) &pSrcB);
29.        inB2 = read_q15x2_ia ((q15_t **) &pSrcB);
30.    
31.        /* Subtract and store 2 times 2 samples at a time */
32.        write_q15x2_ia (&pDst, __QSUB16(inA1, inB1));
33.        write_q15x2_ia (&pDst, __QSUB16(inA2, inB2));
34.    #else
35.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrcA++ - *pSrcB++), 16);
36.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrcA++ - *pSrcB++), 16);
37.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrcA++ - *pSrcB++), 16);
38.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrcA++ - *pSrcB++), 16);
39.    #endif
40.    
41.        /* Decrement loop counter */
42.        blkCnt--;
43.      }
44.    
45.      /* Loop unrolling: Compute remaining outputs */
46.      blkCnt = blockSize % 0x4U;
47.    
48.    #else
49.    
50.      /* Initialize blkCnt with number of samples */
51.      blkCnt = blockSize;
52.    
53.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
54.    
55.      while (blkCnt > 0U)
56.      {
57.        /* C = A - B */
58.    
59.        /* Subtract and store result in destination buffer. */
60.    #if defined (ARM_MATH_DSP)
61.        *pDst++ = (q15_t) __QSUB16(*pSrcA++, *pSrcB++);
62.    #else
63.        *pDst++ = (q15_t) __SSAT(((q31_t) *pSrcA++ - *pSrcB++), 16);
64.    #endif
65.    
66.        /* Decrement loop counter */
67.        blkCnt--;
68.      }
69.    
70.    }

函數描述：

這個函數用於求16位定點數的減法。

函數解析：

第9到48行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第25行，函數read_q15x2_ia一次讀取兩個Q15格式的數據，組成一個Q31格式。
- 第32行，函數write_q15x2_ia一次寫入兩個Q15格式的數據，獲得一個Q31格式數據。
- 第32行，函數__QSUB16實現兩次16bit的飽和減法。
第55到68行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是減數地址。
第2個參數是被減數地址。
第3個參數是結果地址。
第4個參數是數據塊大小，其實就是執行減法的次數。

12.6.4 函數arm_sub_q7

函數原型：

1.    void arm_sub_q7(
2.      const q7_t * pSrcA,
3.      const q7_t * pSrcB,
4.            q7_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.    
9.    #if defined (ARM_MATH_LOOPUNROLL)
10.    
11.      /* Loop unrolling: Compute 4 outputs at a time */
12.      blkCnt = blockSize >> 2U;
13.    
14.      while (blkCnt > 0U)
15.      {
16.        /* C = A - B */
17.    
18.    #if defined (ARM_MATH_DSP)
19.        /* Subtract and store result in destination buffer (4 samples at a time). */
20.        write_q7x4_ia (&pDst, __QSUB8(read_q7x4_ia ((q7_t **) &pSrcA), read_q7x4_ia ((q7_t **) &pSrcB)));
21.    #else
22.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrcA++ - *pSrcB++, 8);
23.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrcA++ - *pSrcB++, 8);
24.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrcA++ - *pSrcB++, 8);
25.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrcA++ - *pSrcB++, 8);
26.    #endif
27.    
28.        /* Decrement loop counter */
29.        blkCnt--;
30.      }
31.    
32.      /* Loop unrolling: Compute remaining outputs */
33.      blkCnt = blockSize % 0x4U;
34.    
35.    #else
36.    
37.      /* Initialize blkCnt with number of samples */
38.      blkCnt = blockSize;
39.    
40.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
41.    
42.      while (blkCnt > 0U)
43.      {
44.        /* C = A - B */
45.    
46.        /* Subtract and store result in destination buffer. */
47.        *pDst++ = (q7_t) __SSAT((q15_t) *pSrcA++ - *pSrcB++, 8);
48.    
49.        /* Decrement loop counter */
50.        blkCnt--;
51.      }
52.    
53.    }

函數描述：

這個函數用於求8位定點數的乘法。

函數解析：

第9到35行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第20行，函數write_q7x4_ia實現一次寫入4個Q7格式數據到Q31各種中。

函數__QSUB8實現一次計算4個Q7格式減法。

第42到51行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是減數地址。
第2個參數是被減數地址。
第3個參數是結果地址。
第4個參數是數據塊大小，其實就是執行減法的次數。

12.6.5 使用舉例

程序設計：

/*
*********************************************************************************************************
*    函 數 名: DSP_Sub
*    功能說明: 減法
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void DSP_Sub(void)
{
    float32_t   pSrcA[5] = {1.0f,1.0f,1.0f,1.0f,1.0f};
    float32_t   pSrcB[5] = {1.0f,1.0f,1.0f,1.0f,1.0f};  
    float32_t   pDst[5];  
    
    q31_t  pSrcA1[5] = {1,1,1,1,1};  
    q31_t  pSrcB1[5] = {1,1,1,1,1};  
    q31_t  pDst1[5];   

    q15_t  pSrcA2[5] = {1,1,1,1,1};  
    q15_t  pSrcB2[5] = {1,1,1,1,1};  
    q15_t  pDst2[5];   

    q7_t  pSrcA3[5] = {0x70,1,1,1,1}; 
    q7_t  pSrcB3[5] = {0x7f,1,1,1,1};  
    q7_t  pDst3[5];  


    /*求減法*********************************/    
    pSrcA[0] += 1.1f;
    arm_sub_f32(pSrcA, pSrcB, pDst, 5);
    printf("arm_sub_f32 = %f\r\n", pDst[0]);
    
    pSrcA1[0] += 1;
    arm_sub_q31(pSrcA1, pSrcB1, pDst1, 5);
    printf("arm_sub_q31 = %d\r\n", pDst1[0]);

    pSrcA2[0] += 1;
    arm_sub_q15(pSrcA2, pSrcB2, pDst2, 5);
    printf("arm_sub_q15 = %d\r\n", pDst2[0]);

    pSrcA3[0] += 1;
    arm_sub_q7(pSrcA3, pSrcB3, pDst3, 5);
    printf("arm_sub_q7 = %d\r\n", pDst3[0]);
    printf("***********************************\r\n");
}

實驗現象：

12.7 比例因子（Vector Scale）

這部分函數主要用於實現數據的比例放大和縮小，浮點數據公式描述如下：

pDst[n] = pSrc[n] * scale, 0 <= n < blockSize.

如果是Q31，Q15，Q7格式的數據，公式描述如下：

pDst[n] = (pSrc[n] * scaleFract) << shift, 0 <= n < blockSize.

這種情況下，比例因子就是：

scale = scaleFract * 2^shift.

注意，這部分函數支持目標指針和源指針指向相同的緩衝區

12.7.1 函數arm_scale_f32

函數原型：

1.    void arm_scale_f32(
2.      const float32_t *pSrc,
3.            float32_t scale,
4.            float32_t *pDst,
5.            uint32_t blockSize)
6.    {
7.      uint32_t blkCnt;                               /* Loop counter */
8.    #if defined(ARM_MATH_NEON_EXPERIMENTAL)
9.        float32x4_t vec1;
10.        float32x4_t res;
11.    
12.        /* Compute 4 outputs at a time */
13.        blkCnt = blockSize >> 2U;
14.    
15.        while (blkCnt > 0U)
16.        {
17.            /* C = A * scale */
18.    
19.            /* Scale the input and then store the results in the destination buffer. */
20.            vec1 = vld1q_f32(pSrc);
21.            res = vmulq_f32(vec1, vdupq_n_f32(scale));
22.            vst1q_f32(pDst, res);
23.    
24.            /* Increment pointers */
25.            pSrc += 4; 
26.            pDst += 4;
27.            
28.            /* Decrement the loop counter */
29.            blkCnt--;
30.        }
31.    
32.        /* Tail */
33.        blkCnt = blockSize & 0x3;
34.    
35.    #else
36.    #if defined (ARM_MATH_LOOPUNROLL)
37.    
38.      /* Loop unrolling: Compute 4 outputs at a time */
39.      blkCnt = blockSize >> 2U;
40.    
41.      while (blkCnt > 0U)
42.      {
43.        /* C = A * scale */
44.    
45.        /* Scale input and store result in destination buffer. */
46.        *pDst++ = (*pSrc++) * scale;
47.    
48.        *pDst++ = (*pSrc++) * scale;
49.    
50.        *pDst++ = (*pSrc++) * scale;
51.    
52.        *pDst++ = (*pSrc++) * scale;
53.    
54.        /* Decrement loop counter */
55.        blkCnt--;
56.      }
57.    
58.      /* Loop unrolling: Compute remaining outputs */
59.      blkCnt = blockSize % 0x4U;
60.    
61.    #else
62.    
63.      /* Initialize blkCnt with number of samples */
64.      blkCnt = blockSize;
65.    
66.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
67.    #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */
68.    
69.      while (blkCnt > 0U)
70.      {
71.        /* C = A * scale */
72.    
73.        /* Scale input and store result in destination buffer. */
74.        *pDst++ = (*pSrc++) * scale;
75.    
76.        /* Decrement loop counter */
77.        blkCnt--;
78.      }
79.    
80.    }

函數描述：

這個函數用於求32位浮點數的比例因子計算。

函數解析：

第8到35行，用於NEON指令集，當前的CM內核不支持。
第36到61行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
第69到78行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是數據源地址。
第2個參數是比例因子
第3個參數是結果地址。
第4個參數是數據塊大小，其實就是執行比例因子計算的次數。

12.7.2 函數arm_scale_q31

函數原型：

1.    void arm_scale_q31(
2.      const q31_t *pSrc,
3.            q31_t scaleFract,
4.            int8_t shift,
5.            q31_t *pDst,
6.            uint32_t blockSize)
7.    {
8.            uint32_t blkCnt;                               /* Loop counter */
9.            q31_t in, out;                                 /* Temporary variables */
10.            int8_t kShift = shift + 1;                     /* Shift to apply after scaling */
11.            int8_t sign = (kShift & 0x80);
12.    
13.    #if defined (ARM_MATH_LOOPUNROLL)
14.    
15.      /* Loop unrolling: Compute 4 outputs at a time */
16.      blkCnt = blockSize >> 2U;
17.    
18.      if (sign == 0U)
19.      {
20.        while (blkCnt > 0U)
21.        {
22.          /* C = A * scale */
23.    
24.          /* Scale input and store result in destination buffer. */
25.          in = *pSrc++;                                /* read input from source */
26.          in = ((q63_t) in * scaleFract) >> 32;        /* multiply input with scaler value */
27.          out = in << kShift;                          /* apply shifting */
28.          if (in != (out >> kShift))                   /* saturate the result */
29.            out = 0x7FFFFFFF ^ (in >> 31);
30.          *pDst++ = out;                               /* Store result destination */
31.    
32.          in = *pSrc++;
33.          in = ((q63_t) in * scaleFract) >> 32;
34.          out = in << kShift;
35.          if (in != (out >> kShift))
36.            out = 0x7FFFFFFF ^ (in >> 31);
37.          *pDst++ = out;
38.    
39.          in = *pSrc++;
40.          in = ((q63_t) in * scaleFract) >> 32;
41.          out = in << kShift;
42.          if (in != (out >> kShift))
43.            out = 0x7FFFFFFF ^ (in >> 31);
44.          *pDst++ = out;
45.    
46.          in = *pSrc++;
47.          in = ((q63_t) in * scaleFract) >> 32;
48.          out = in << kShift;
49.          if (in != (out >> kShift))
50.            out = 0x7FFFFFFF ^ (in >> 31);
51.          *pDst++ = out;
52.    
53.          /* Decrement loop counter */
54.          blkCnt--;
55.        }
56.      }
57.      else
58.      {
59.        while (blkCnt > 0U)
60.        {
61.          /* C = A * scale */
62.    
63.          /* Scale input and store result in destination buffer. */
64.          in = *pSrc++;                                /* read four inputs from source */
65.          in = ((q63_t) in * scaleFract) >> 32;        /* multiply input with scaler value */
66.          out = in >> -kShift;                         /* apply shifting */
67.          *pDst++ = out;                               /* Store result destination */
68.    
69.          in = *pSrc++;
70.          in = ((q63_t) in * scaleFract) >> 32;
71.          out = in >> -kShift;
72.          *pDst++ = out;
73.    
74.          in = *pSrc++;
75.          in = ((q63_t) in * scaleFract) >> 32;
76.          out = in >> -kShift;
77.          *pDst++ = out;
78.    
79.          in = *pSrc++;
80.          in = ((q63_t) in * scaleFract) >> 32;
81.          out = in >> -kShift;
82.          *pDst++ = out;
83.    
84.          /* Decrement loop counter */
85.          blkCnt--;
86.        }
87.      }
88.    
89.      /* Loop unrolling: Compute remaining outputs */
90.      blkCnt = blockSize % 0x4U;
91.    
92.    #else
93.    
94.      /* Initialize blkCnt with number of samples */
95.      blkCnt = blockSize;
96.    
97.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
98.    
99.      if (sign == 0U)
100.      {
101.        while (blkCnt > 0U)
102.        {
103.          /* C = A * scale */
104.    
105.          /* Scale input and store result in destination buffer. */
106.          in = *pSrc++;
107.          in = ((q63_t) in * scaleFract) >> 32;
108.          out = in << kShift;
109.          if (in != (out >> kShift))
110.              out = 0x7FFFFFFF ^ (in >> 31);
111.          *pDst++ = out;
112.    
113.          /* Decrement loop counter */
114.          blkCnt--;
115.        }
116.      }
117.      else
118.      {
119.        while (blkCnt > 0U)
120.        {
121.          /* C = A * scale */
122.    
123.          /* Scale input and store result in destination buffer. */
124.          in = *pSrc++;
125.          in = ((q63_t) in * scaleFract) >> 32;
126.          out = in >> -kShift;
127.          *pDst++ = out;
128.    
129.          /* Decrement loop counter */
130.          blkCnt--;
131.        }
132.      }
133.    
134.    }

函數描述：

這個函數用於求32位定點數的比例因子計算。

函數解析：

第13到92行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第18行到56行，如果函數的移位形參shift是正數，那麼執行左移。
- 第57行到87行，如果函數的移位形參shift是負數，那麼執行右移。
- 這裏特別注意一點，兩個Q31函數相乘是2.62格式，而函數的結果要是Q31格式的，所以程序裏面做了專門處理。

第26行，左移32位，那麼結果就是2.30格式。

第27行，kShift = shift + 1，也就是out = in <<（shift + 1）多執行了一次左移操作。

相當於2.30格式，轉換爲2.31格式。

- 第28到29行，做了一個Q31的飽和處理，也就是將2.31格式轉換爲1.31。

數值的左移僅支持將其左移後再右移相應的位數後數值不變的情況，如果不滿足這個條件，那麼要對輸出結果做飽和運算，這裏分兩種情況：

out = 0x7FFFFFFF ^ (in >> 31) （in是正數）

= 0x7FFFFFFF ^ 0x00000000

= 0x7FFFFFFF

out = 0x7FFFFFFF ^ (in >> 31) （in是負數）

= 0x7FFFFFFF ^ 0xFFFFFFFF

= 0x80000000

第99到132行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是數據源地址。
第2個參數是比例因子。
第3個參數是移位參數，正數表示右移，負數表示左移。
第4參數是結果地址。
第5參數是數據塊大小，其實就是執行比例因子計算的次數。

12.7.3 函數arm_scale_q15

函數原型：

1.    void arm_shift_q15(
2.      const q15_t * pSrc,
3.            int8_t shiftBits,
4.            q15_t * pDst,
5.            uint32_t blockSize)
6.    {
7.            uint32_t blkCnt;                               /* Loop counter */
8.            uint8_t sign = (shiftBits & 0x80);             /* Sign of shiftBits */
9.    
10.    #if defined (ARM_MATH_LOOPUNROLL)
11.    
12.    #if defined (ARM_MATH_DSP)
13.      q15_t in1, in2;                                /* Temporary input variables */
14.    #endif
15.    
16.      /* Loop unrolling: Compute 4 outputs at a time */
17.      blkCnt = blockSize >> 2U;
18.    
19.      /* If the shift value is positive then do right shift else left shift */
20.      if (sign == 0U)
21.      {
22.        while (blkCnt > 0U)
23.        {
24.          /* C = A << shiftBits */
25.    
26.    #if defined (ARM_MATH_DSP)
27.          /* read 2 samples from source */
28.          in1 = *pSrc++;
29.          in2 = *pSrc++;
30.    
31.          /* Shift the inputs and then store the results in the destination buffer. */
32.    #ifndef ARM_MATH_BIG_ENDIAN
33.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
34.                                         __SSAT((in2 << shiftBits), 16), 16));
35.    #else
36.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
37.                                          __SSAT((in1 << shiftBits), 16), 16));
38.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
39.    
40.          /* read 2 samples from source */
41.          in1 = *pSrc++;
42.          in2 = *pSrc++;
43.    
44.    #ifndef ARM_MATH_BIG_ENDIAN
45.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
46.                                         __SSAT((in2 << shiftBits), 16), 16));
47.    #else
48.          write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
49.                                         __SSAT((in1 << shiftBits), 16), 16));
50.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
51.    
52.    #else
53.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
54.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
55.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
56.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
57.    #endif
58.    
59.          /* Decrement loop counter */
60.          blkCnt--;
61.        }
62.      }
63.      else
64.      {
65.        while (blkCnt > 0U)
66.        {
67.          /* C = A >> shiftBits */
68.    
69.    #if defined (ARM_MATH_DSP)
70.          /* read 2 samples from source */
71.          in1 = *pSrc++;
72.          in2 = *pSrc++;
73.    
74.          /* Shift the inputs and then store the results in the destination buffer. */
75.    #ifndef ARM_MATH_BIG_ENDIAN
76.          write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
77.                                         (in2 >> -shiftBits), 16));
78.    #else
79.          write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
80.                                         (in1 >> -shiftBits), 16));
81.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
82.    
83.          /* read 2 samples from source */
84.          in1 = *pSrc++;
85.          in2 = *pSrc++;
86.    
87.    #ifndef ARM_MATH_BIG_ENDIAN
88.          write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
89.                                         (in2 >> -shiftBits), 16));
90.    #else
91.          write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
92.                                         (in1 >> -shiftBits), 16));
93.    #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
94.    
95.    #else
96.          *pDst++ = (*pSrc++ >> -shiftBits);
97.          *pDst++ = (*pSrc++ >> -shiftBits);
98.          *pDst++ = (*pSrc++ >> -shiftBits);
99.          *pDst++ = (*pSrc++ >> -shiftBits);
100.    #endif
101.    
102.          /* Decrement loop counter */
103.          blkCnt--;
104.        }
105.      }
106.    
107.      /* Loop unrolling: Compute remaining outputs */
108.      blkCnt = blockSize % 0x4U;
109.    
110.    #else
111.    
112.      /* Initialize blkCnt with number of samples */
113.      blkCnt = blockSize;
114.    
115.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
116.    
117.      /* If the shift value is positive then do right shift else left shift */
118.      if (sign == 0U)
119.      {
120.        while (blkCnt > 0U)
121.        {
122.          /* C = A << shiftBits */
123.    
124.          /* Shift input and store result in destination buffer. */
125.          *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
126.    
127.          /* Decrement loop counter */
128.          blkCnt--;
129.        }
130.      }
131.      else
132.      {
133.        while (blkCnt > 0U)
134.        {
135.          /* C = A >> shiftBits */
136.    
137.          /* Shift input and store result in destination buffer. */
138.          *pDst++ = (*pSrc++ >> -shiftBits);
139.    
140.          /* Decrement loop counter */
141.          blkCnt--;
142.        }
143.      }
144.    
145.    }

函數描述：

這個函數用於求16位定點數的比例因子計算。

函數解析：

第10到110行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 第20到62行，如果函數的移位形參shiftBits是正數，執行左移。
- 第63到105行，如果函數的移位形參shiftBits是負數，執行右移。
- 第33行，函數__PKHBT也是SIMD指令，作用是將將兩個16位的數據合併成32位數據。用C實現的話，如下：

  #define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) <<    0) & (int32_t)0x0000FFFF) | \
                                      (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000)  )

函數write_q15x2_ia的原型如下：

__STATIC_FORCEINLINE void write_q15x2_ia (
  q15_t ** pQ15,
  q31_t    value)
{
  q31_t val = value;

  memcpy (*pQ15, &val, 4);
  *pQ15 += 2;
}

作用是寫入兩次Q15格式數據，組成一個Q31格式數據，並將數據地址遞增，方便下次寫入。

第118到143行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理

函數參數：

第1個參數是數據源地址。
第2個參數是比例因子。
第3個參數是移位參數，正數表示右移，負數表示左移。
第4參數是結果地址。
第5參數是數據塊大小，其實就是執行比例因子計算的次數。

12.7.4 函數arm_scale_q7

函數原型：

1.    void arm_scale_q7(
2.      const q7_t * pSrc,
3.            q7_t scaleFract,
4.            int8_t shift,
5.            q7_t * pDst,
6.            uint32_t blockSize)
7.    {
8.            uint32_t blkCnt;                               /* Loop counter */
9.            int8_t kShift = 7 - shift;                     /* Shift to apply after scaling */
10.    
11.    #if defined (ARM_MATH_LOOPUNROLL)
12.    
13.    #if defined (ARM_MATH_DSP)
14.      q7_t in1,  in2,  in3,  in4;                    /* Temporary input variables */
15.      q7_t out1, out2, out3, out4;                   /* Temporary output variables */
16.    #endif
17.    
18.      /* Loop unrolling: Compute 4 outputs at a time */
19.      blkCnt = blockSize >> 2U;
20.    
21.      while (blkCnt > 0U)
22.      {
23.        /* C = A * scale */
24.    
25.    #if defined (ARM_MATH_DSP)
26.        /* Reading 4 inputs from memory */
27.        in1 = *pSrc++;
28.        in2 = *pSrc++;
29.        in3 = *pSrc++;
30.        in4 = *pSrc++;
31.    
32.        /* Scale inputs and store result in the temporary variable. */
33.        out1 = (q7_t) (__SSAT(((in1) * scaleFract) >> kShift, 8));
34.        out2 = (q7_t) (__SSAT(((in2) * scaleFract) >> kShift, 8));
35.        out3 = (q7_t) (__SSAT(((in3) * scaleFract) >> kShift, 8));
36.        out4 = (q7_t) (__SSAT(((in4) * scaleFract) >> kShift, 8));
37.    
38.        /* Pack and store result in destination buffer (in single write) */
39.        write_q7x4_ia (&pDst, __PACKq7(out1, out2, out3, out4));
40.    #else
41.        *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
42.        *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
43.        *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
44.        *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
45.    #endif
46.    
47.        /* Decrement loop counter */
48.        blkCnt--;
49.      }
50.    
51.      /* Loop unrolling: Compute remaining outputs */
52.      blkCnt = blockSize % 0x4U;
53.    
54.    #else
55.    
56.      /* Initialize blkCnt with number of samples */
57.      blkCnt = blockSize;
58.    
59.    #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
60.    
61.      while (blkCnt > 0U)
62.      {
63.        /* C = A * scale */
64.    
65.        /* Scale input and store result in destination buffer. */
66.        *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
67.    
68.        /* Decrement loop counter */
69.        blkCnt--;
70.      }
71.    
72.    }

函數描述：

這個函數用於求8位定點數的比例因子計算。

函數解析：

第9行，這個變量設計很巧妙，這樣下面處理正數左移和負數右移就很方面了，可以直接使用一個右移就可以實現。
第11到54行，實現四個爲一組進行計數，好處是加快執行速度，降低while循環佔用時間。
- 33到36行，對輸入的數據做8位的飽和處理。比如：

(in1 * scaleFract) >> kShift

= (in1 * scaleFract) * 2^（shift - 7）

= ((in1 * scaleFract) >>7）*（2^shift）

源數據in1格式Q7乘以比例因子scaleFract格式Q7，也就是2.14格式，再右移7bit就是2.7格式，

此時如果shift正數，那麼就是當前結果左移shitf位，如果shift是負數，那麼就是當前結果右移shift位。最終結果通過__SSAT做個飽和運算。

第61到70行，四個爲一組剩餘數據的處理或者不採用四個爲一組時數據處理。

函數參數：

第1個參數是數據源地址。
第2個參數是比例因子。
第3個參數是移位參數，正數表示右移，負數表示左移。
第4參數是結果地址。
第5參數是數據塊大小，其實就是執行比例因子計算的次數。

12.7.5 使用舉例

程序設計：

/*
*********************************************************************************************************
*    函 數 名: DSP_Scale
*    功能說明: 比例因子
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void DSP_Scale(void)
{
    float32_t   pSrcA[5] = {1.0f,1.0f,1.0f,1.0f,1.0f};
    float32_t   scale = 0.0f;  
    float32_t   pDst[5];  
    
    q31_t  pSrcA1[5] = {0x6fffffff,1,1,1,1};  
    q31_t  scale1 = 0x6fffffff;  
    q31_t  pDst1[5];   

    q15_t  pSrcA2[5] = {0x6fff,1,1,1,1};  
    q15_t  scale2 = 0x6fff;  
    q15_t  pDst2[5];   

    q7_t  pSrcA3[5] = {0x70,1,1,1,1}; 
    q7_t  scale3 = 0x6f;  
    q7_t pDst3[5];  

    /*求比例因子計算*********************************/    
    scale += 0.1f;
    arm_scale_f32(pSrcA, scale, pDst, 5);
    printf("arm_scale_f32 = %f\r\n", pDst[0]);
    
    scale1 += 1;
    arm_scale_q31(pSrcA1, scale1, 0, pDst1, 5);
    printf("arm_scale_q31 = %x\r\n", pDst1[0]);

    scale2 += 1;
    arm_scale_q15(pSrcA2, scale2, 0, pDst2, 5);
    printf("arm_scale_q15 = %x\r\n", pDst2[0]);

    scale3 += 1;
    arm_scale_q7(pSrcA3, scale3, 0, pDst3, 5);
    printf("arm_scale_q7 = %x\r\n", pDst3[0]);
    printf("***********************************\r\n");
}

實驗現象：

12.8 實驗例程說明（MDK）

配套例子：

V7-207_DSP基礎運算（相反數，偏移，移位，減法和比例因子）

實驗目的：

學習基礎運算（相反數，偏移，移位，減法和比例因子）

實驗內容：

啓動一個自動重裝軟件定時器，每100ms翻轉一次LED2。
按下按鍵K1, DSP求相反數運算。
按下按鍵K2, DSP求偏移運算。
按下按鍵K3, DSP求移位運算。
按下搖桿OK鍵, DSP求減法運算。
按下搖桿上鍵, DSP比例因子運算。

使用AC6注意事項

特別注意附件章節C的問題

上電後串口打印的信息：

波特率 115200，數據位 8，奇偶校驗位無，停止位 1。

詳見本章的3.5，4.5，5.4和6.5小節。

程序設計：

系統棧大小分配：

RAM空間用的DTCM：

硬件外設初始化

硬件外設的初始化是在 bsp.c 文件實現：

/*
*********************************************************************************************************
*    函 數 名: bsp_Init
*    功能說明: 初始化所有的硬件設備。該函數配置CPU寄存器和外設的寄存器並初始化一些全局變量。只需要調用一次
*    形    參：無
*    返 回 值: 無
*********************************************************************************************************
*/
void bsp_Init(void)
{
    /* 配置MPU */
    MPU_Config();
    
    /* 使能L1 Cache */
    CPU_CACHE_Enable();

    /* 
       STM32H7xx HAL 庫初始化，此時系統用的還是H7自帶的64MHz，HSI時鐘:
       - 調用函數HAL_InitTick，初始化滴答時鐘中斷1ms。
       - 設置NVIV優先級分組爲4。
     */
    HAL_Init();

    /* 
       配置系統時鐘到400MHz
       - 切換使用HSE。
       - 此函數會更新全局變量SystemCoreClock，並重新配置HAL_InitTick。
    */
    SystemClock_Config();

    /* 
       Event Recorder：
       - 可用於代碼執行時間測量，MDK5.25及其以上版本才支持，IAR不支持。
       - 默認不開啓，如果要使能此選項，務必看V7開發板用戶手冊第8章
    */    
#if Enable_EventRecorder == 1  
    /* 初始化EventRecorder並開啓 */
    EventRecorderInitialize(EventRecordAll, 1U);
    EventRecorderStart();
#endif
    
    bsp_InitKey();        /* 按鍵初始化，要放在滴答定時器之前，因爲按鈕檢測是通過滴答定時器掃描 */
    bsp_InitTimer();      /* 初始化滴答定時器 */
    bsp_InitUart();    /* 初始化串口 */
    bsp_InitExtIO();    /* 初始化FMC總線74HC574擴展IO. 必須在 bsp_InitLed()前執行 */    
    bsp_InitLed();        /* 初始化LED */    
}

MPU配置和Cache配置：

數據Cache和指令Cache都開啓。配置了AXI SRAM區（本例子未用到AXI SRAM），FMC的擴展IO區。

/*
*********************************************************************************************************
*    函 數 名: MPU_Config
*    功能說明: 配置MPU
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void MPU_Config( void )
{
    MPU_Region_InitTypeDef MPU_InitStruct;

    /* 禁止 MPU */
    HAL_MPU_Disable();

    /* 配置AXI SRAM的MPU屬性爲Write back, Read allocate，Write allocate */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.BaseAddress      = 0x24000000;
    MPU_InitStruct.Size             = MPU_REGION_SIZE_512KB;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_BUFFERABLE;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_CACHEABLE;
    MPU_InitStruct.IsShareable      = MPU_ACCESS_NOT_SHAREABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER0;
    MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL1;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;

    HAL_MPU_ConfigRegion(&MPU_InitStruct);
    
    
    /* 配置FMC擴展IO的MPU屬性爲Device或者Strongly Ordered */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.BaseAddress      = 0x60000000;
    MPU_InitStruct.Size             = ARM_MPU_REGION_SIZE_64KB;    
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_BUFFERABLE;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;    
    MPU_InitStruct.IsShareable      = MPU_ACCESS_NOT_SHAREABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER1;
    MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;
    
    HAL_MPU_ConfigRegion(&MPU_InitStruct);

    /*使能 MPU */
    HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

/*
*********************************************************************************************************
*    函 數 名: CPU_CACHE_Enable
*    功能說明: 使能L1 Cache
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void CPU_CACHE_Enable(void)
{
    /* 使能 I-Cache */
    SCB_EnableICache();

    /* 使能 D-Cache */
    SCB_EnableDCache();
}

主功能：

主程序實現如下操作：

啓動一個自動重裝軟件定時器，每100ms翻轉一次LED2。
按下按鍵K1, DSP求相反數運算。
按下按鍵K2, DSP求偏移運算。
按下按鍵K3, DSP求移位運算。
按下搖桿OK鍵, DSP求減法運算。
按下搖桿上鍵, DSP比例因子運算。

/*
*********************************************************************************************************
*    函 數 名: main
*    功能說明: c程序入口
*    形    參：無
*    返 回 值: 錯誤代碼(無需處理)
*********************************************************************************************************
*/
int main(void)
{
    uint8_t ucKeyCode;        /* 按鍵代碼 */
    

    bsp_Init();        /* 硬件初始化 */
    PrintfLogo();    /* 打印例程信息到串口1 */

    PrintfHelp();    /* 打印操作提示信息 */
    

    bsp_StartAutoTimer(0, 100);    /* 啓動1個100ms的自動重裝的定時器 */

    /* 進入主程序循環體 */
    while (1)
    {
        bsp_Idle();        /* 這個函數在bsp.c文件。用戶可以修改這個函數實現CPU休眠和喂狗 */

        /* 判斷定時器超時時間 */
        if (bsp_CheckTimer(0))    
        {
            /* 每隔100ms 進來一次 */  
            bsp_LedToggle(2);
        }

        ucKeyCode = bsp_GetKey();    /* 讀取鍵值, 無鍵按下時返回 KEY_NONE = 0 */
        if (ucKeyCode != KEY_NONE)
        {
            switch (ucKeyCode)
            {
                case KEY_DOWN_K1:            /* K1鍵按下，求相反數 */
                    DSP_Negate();
                    break;

                case KEY_DOWN_K2:            /* K2鍵按下, 求偏移 */
                    DSP_Offset();
                    break;

                case KEY_DOWN_K3:            /* K3鍵按下，求移位 */
                    DSP_Shift();
                    break;
    
                case JOY_DOWN_OK:            /* 搖桿OK鍵按下，求減法 */
                    DSP_Sub();
                    break;
                
                case JOY_DOWN_U:            /* 搖桿上鍵按下，求比例因子計算 */
                    DSP_Scale();
                    break;

                default:
                    /* 其他的鍵值不處理 */
                    break;
            }
        }
    }
}

12.9 實驗例程說明（IAR）

配套例子：

V7-207_DSP基礎運算（相反數，偏移，移位，減法和比例因子）

實驗目的：

學習基礎運算（相反數，偏移，移位，減法和比例因子）

實驗內容：

啓動一個自動重裝軟件定時器，每100ms翻轉一次LED2。
按下按鍵K1, DSP求相反數運算。
按下按鍵K2, DSP求偏移運算。
按下按鍵K3, DSP求移位運算。
按下搖桿OK鍵, DSP求減法運算。
按下搖桿上鍵, DSP比例因子運算。

使用AC6注意事項

特別注意附件章節C的問題

上電後串口打印的信息：

波特率 115200，數據位 8，奇偶校驗位無，停止位 1。

詳見本章的3.5，4.5，5.4和6.5小節。

程序設計：

系統棧大小分配：

RAM空間用的DTCM：

硬件外設初始化

硬件外設的初始化是在 bsp.c 文件實現：

/*
*********************************************************************************************************
*    函 數 名: bsp_Init
*    功能說明: 初始化所有的硬件設備。該函數配置CPU寄存器和外設的寄存器並初始化一些全局變量。只需要調用一次
*    形    參：無
*    返 回 值: 無
*********************************************************************************************************
*/
void bsp_Init(void)
{
    /* 配置MPU */
    MPU_Config();
    
    /* 使能L1 Cache */
    CPU_CACHE_Enable();

    /* 
       STM32H7xx HAL 庫初始化，此時系統用的還是H7自帶的64MHz，HSI時鐘:
       - 調用函數HAL_InitTick，初始化滴答時鐘中斷1ms。
       - 設置NVIV優先級分組爲4。
     */
    HAL_Init();

    /* 
       配置系統時鐘到400MHz
       - 切換使用HSE。
       - 此函數會更新全局變量SystemCoreClock，並重新配置HAL_InitTick。
    */
    SystemClock_Config();

    /* 
       Event Recorder：
       - 可用於代碼執行時間測量，MDK5.25及其以上版本才支持，IAR不支持。
       - 默認不開啓，如果要使能此選項，務必看V7開發板用戶手冊第8章
    */    
#if Enable_EventRecorder == 1  
    /* 初始化EventRecorder並開啓 */
    EventRecorderInitialize(EventRecordAll, 1U);
    EventRecorderStart();
#endif
    
    bsp_InitKey();        /* 按鍵初始化，要放在滴答定時器之前，因爲按鈕檢測是通過滴答定時器掃描 */
    bsp_InitTimer();      /* 初始化滴答定時器 */
    bsp_InitUart();    /* 初始化串口 */
    bsp_InitExtIO();    /* 初始化FMC總線74HC574擴展IO. 必須在 bsp_InitLed()前執行 */    
    bsp_InitLed();        /* 初始化LED */    
}

MPU配置和Cache配置：

數據Cache和指令Cache都開啓。配置了AXI SRAM區（本例子未用到AXI SRAM），FMC的擴展IO區。

/*
*********************************************************************************************************
*    函 數 名: MPU_Config
*    功能說明: 配置MPU
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void MPU_Config( void )
{
    MPU_Region_InitTypeDef MPU_InitStruct;

    /* 禁止 MPU */
    HAL_MPU_Disable();

    /* 配置AXI SRAM的MPU屬性爲Write back, Read allocate，Write allocate */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.BaseAddress      = 0x24000000;
    MPU_InitStruct.Size             = MPU_REGION_SIZE_512KB;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_BUFFERABLE;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_CACHEABLE;
    MPU_InitStruct.IsShareable      = MPU_ACCESS_NOT_SHAREABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER0;
    MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL1;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;

    HAL_MPU_ConfigRegion(&MPU_InitStruct);
    
    
    /* 配置FMC擴展IO的MPU屬性爲Device或者Strongly Ordered */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.BaseAddress      = 0x60000000;
    MPU_InitStruct.Size             = ARM_MPU_REGION_SIZE_64KB;    
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_BUFFERABLE;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;    
    MPU_InitStruct.IsShareable      = MPU_ACCESS_NOT_SHAREABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER1;
    MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;
    
    HAL_MPU_ConfigRegion(&MPU_InitStruct);

    /*使能 MPU */
    HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

/*
*********************************************************************************************************
*    函 數 名: CPU_CACHE_Enable
*    功能說明: 使能L1 Cache
*    形    參: 無
*    返 回 值: 無
*********************************************************************************************************
*/
static void CPU_CACHE_Enable(void)
{
    /* 使能 I-Cache */
    SCB_EnableICache();

    /* 使能 D-Cache */
    SCB_EnableDCache();
}

主功能：

主程序實現如下操作：

啓動一個自動重裝軟件定時器，每100ms翻轉一次LED2。
按下按鍵K1, DSP求相反數運算。
按下按鍵K2, DSP求偏移運算。
按下按鍵K3, DSP求移位運算。
按下搖桿OK鍵, DSP求減法運算。
按下搖桿上鍵, DSP比例因子運算。

/*
*********************************************************************************************************
*    函 數 名: main
*    功能說明: c程序入口
*    形    參：無
*    返 回 值: 錯誤代碼(無需處理)
*********************************************************************************************************
*/
int main(void)
{
    uint8_t ucKeyCode;        /* 按鍵代碼 */
    

    bsp_Init();        /* 硬件初始化 */
    PrintfLogo();    /* 打印例程信息到串口1 */

    PrintfHelp();    /* 打印操作提示信息 */
    

    bsp_StartAutoTimer(0, 100);    /* 啓動1個100ms的自動重裝的定時器 */

    /* 進入主程序循環體 */
    while (1)
    {
        bsp_Idle();        /* 這個函數在bsp.c文件。用戶可以修改這個函數實現CPU休眠和喂狗 */

        /* 判斷定時器超時時間 */
        if (bsp_CheckTimer(0))    
        {
            /* 每隔100ms 進來一次 */  
            bsp_LedToggle(2);
        }

        ucKeyCode = bsp_GetKey();    /* 讀取鍵值, 無鍵按下時返回 KEY_NONE = 0 */
        if (ucKeyCode != KEY_NONE)
        {
            switch (ucKeyCode)
            {
                case KEY_DOWN_K1:            /* K1鍵按下，求相反數 */
                    DSP_Negate();
                    break;

                case KEY_DOWN_K2:            /* K2鍵按下, 求偏移 */
                    DSP_Offset();
                    break;

                case KEY_DOWN_K3:            /* K3鍵按下，求移位 */
                    DSP_Shift();
                    break;
    
                case JOY_DOWN_OK:            /* 搖桿OK鍵按下，求減法 */
                    DSP_Sub();
                    break;
                
                case JOY_DOWN_U:            /* 搖桿上鍵按下，求比例因子計算 */
                    DSP_Scale();
                    break;

                default:
                    /* 其他的鍵值不處理 */
                    break;
            }
        }
    }
}

12.10 總結

DSP基礎函數就跟大家講這麼多，希望初學的同學多多的聯繫，並在自己以後的項目中多多使用，效果必將事半功倍。

【STM32H7的DSP教程】第12章 DSP基礎函數-相反數，偏移，移位，減法和比例因子

第12章 DSP基礎函數-相反數，偏移，移位，減法和比例因子

12.1 初學者重要提示

12.2 DSP基礎運算指令

12.3 相反數（Vector Negate）

12.3.1 函數arm_negate_f32

12.3.2 函數arm_negate _q31

12.3.3 函數arm_negate_q15

12.3.4 函數arm_negate_q7

12.3.5 使用舉例

12.4 偏移（Vector Offset）

12.4.1 函數arm_offset_f32

12.4.2 函數arm_offset_q31

12.4.3 函數arm_offset_q15

12.4.4 函數arm_offset_q7

12.4.5 使用舉例

12.5 移位（Vector Shift）

12.5.1 函數arm_shift_q31

12.5.2 函數arm_shift_q15

12.5.3 函數arm_shift_q7

12.5.4 使用舉例

12.6 減法（Vector Sub）

12.6.1 函數arm_sub_f32

12.6.2 函數arm_sub_q31

12.6.3 函數arm_sub_q15

12.6.4 函數arm_sub_q7

12.6.5 使用舉例

12.7 比例因子（Vector Scale）

12.7.1 函數arm_scale_f32

12.7.2 函數arm_scale_q31

12.7.3 函數arm_scale_q15

12.7.4 函數arm_scale_q7

12.7.5 使用舉例

12.8 實驗例程說明（MDK）

12.9 實驗例程說明（IAR）

12.10 總結