文章目錄

ffmpeg簡單分析系列----音頻（audio）

常用api

ffmpeg簡單分析系列----音頻（audio）

音頻有幾個重要的參數：採樣率（sample_rate，單位是Hz）,通道數（channels）,採樣格式（sample_fmt,見AVSampleFormat）
在ffmpeg中，音頻數據的存儲格式也有planar和packed之分，planar表示每個通道數據單獨存儲，packed表示通道數據交叉存儲，在AVSampleFormat的類型末尾帶P的就表示是planar格式的
比如雙聲道，用L表示左聲道，R表示右聲道，那麼packed的存儲爲：LRLRLRLRLRLRLRLR；而planar的存儲爲LLLLRRRRLLLLRRRR

採樣格式

採樣格式定義在libavutil/samplefmt.h中

enum AVSampleFormat {
 AV_SAMPLE_FMT_NONE = -1,
 AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
 AV_SAMPLE_FMT_S16,         ///< signed 16 bits
 AV_SAMPLE_FMT_S32,         ///< signed 32 bits
 AV_SAMPLE_FMT_FLT,         ///< float
 AV_SAMPLE_FMT_DBL,         ///< double
 
  AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
  AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
  AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
  AV_SAMPLE_FMT_FLTP,        ///< float, planar
  AV_SAMPLE_FMT_DBLP,        ///< double, planar
  AV_SAMPLE_FMT_S64,         ///< signed 64 bits
  AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar
    
  AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
    };

通道佈局（channel layout）

在AVCodecContext的結構體中有channel_layout和request_channel_layout這兩個參數，它們的類型是uint64_t，表示無符號64位整型
很多人對這個參數無從下手，因爲不知道它表示什麼，該怎麼使用它
request_channel_layout表示你期待的通道佈局，而channel_layout表示實際的通道佈局，channel_layout是由解碼器設置的
channel_layout的值轉換成二進制後，有多少個1就表示多少個通道
在channel_layout.h頭文件中定義了有關它的一些掩碼，通過這些掩碼的組合就能湊成多種通道佈局，例如AV_CH_LAYOUT_STEREO是立體聲（2通道），其通道的存放順序爲LEFT | RIGHT；AV_CH_LAYOUT_4POINT0是4通道，其通道的存放順序爲
LEFT | RIGHT | FRONT-CENTER | BACK-CENTER
有了channel_layout，我們就知道了通道的順序，這樣我們就可以隨意取得我們指定的通道的數據
這裏注意一點的是，sdl不支持音頻平面格式（planar）,因此如果用sdl播放音頻必須先得轉成packed格式
以下是channel_layout.h的部分摘抄：

   37 /**
   38  * @defgroup channel_masks Audio channel masks
   39  *
   40  * A channel layout is a 64-bits integer with a bit set for every channel.
   41  * The number of bits set must be equal to the number of channels.
   42  * The value 0 means that the channel layout is not known.
   43  * @note this data structure is not powerful enough to handle channels
   44  * combinations that have the same channel multiple times, such as
   45  * dual-mono.
   46  *
   47  * @{
   48  */
   49 #define AV_CH_FRONT_LEFT             0x00000001
   50 #define AV_CH_FRONT_RIGHT            0x00000002
   51 #define AV_CH_FRONT_CENTER           0x00000004
   52 #define AV_CH_LOW_FREQUENCY          0x00000008
   53 #define AV_CH_BACK_LEFT              0x00000010
   54 #define AV_CH_BACK_RIGHT             0x00000020
   55 #define AV_CH_FRONT_LEFT_OF_CENTER   0x00000040
   56 #define AV_CH_FRONT_RIGHT_OF_CENTER  0x00000080
   57 #define AV_CH_BACK_CENTER            0x00000100
   58 #define AV_CH_SIDE_LEFT              0x00000200
   59 #define AV_CH_SIDE_RIGHT             0x00000400
   60 #define AV_CH_TOP_CENTER             0x00000800
   61 #define AV_CH_TOP_FRONT_LEFT         0x00001000
   62 #define AV_CH_TOP_FRONT_CENTER       0x00002000
   63 #define AV_CH_TOP_FRONT_RIGHT        0x00004000
   64 #define AV_CH_TOP_BACK_LEFT          0x00008000
   65 #define AV_CH_TOP_BACK_CENTER        0x00010000
   66 #define AV_CH_TOP_BACK_RIGHT         0x00020000
   67 #define AV_CH_STEREO_LEFT            0x20000000  ///< Stereo downmix.
   68 #define AV_CH_STEREO_RIGHT           0x40000000  ///< See AV_CH_STEREO_LEFT.
   69 #define AV_CH_WIDE_LEFT              0x0000000080000000ULL
   70 #define AV_CH_WIDE_RIGHT             0x0000000100000000ULL
   71 #define AV_CH_SURROUND_DIRECT_LEFT   0x0000000200000000ULL
   72 #define AV_CH_SURROUND_DIRECT_RIGHT  0x0000000400000000ULL
   73 #define AV_CH_LOW_FREQUENCY_2        0x0000000800000000ULL
   74 
   75 /** Channel mask value used for AVCodecContext.request_channel_layout
   76     to indicate that the user requests the channel order of the decoder output
   77     to be the native codec channel order. */
   78 #define AV_CH_LAYOUT_NATIVE          0x8000000000000000ULL
   79 
   80 /**
   81  * @}
   82  * @defgroup channel_mask_c Audio channel layouts
   83  * @{
   84  * */
   85 #define AV_CH_LAYOUT_MONO              (AV_CH_FRONT_CENTER)
   86 #define AV_CH_LAYOUT_STEREO            (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT)
   87 #define AV_CH_LAYOUT_2POINT1           (AV_CH_LAYOUT_STEREO|AV_CH_LOW_FREQUENCY)
   88 #define AV_CH_LAYOUT_2_1               (AV_CH_LAYOUT_STEREO|AV_CH_BACK_CENTER)
   89 #define AV_CH_LAYOUT_SURROUND          (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER)
   90 #define AV_CH_LAYOUT_3POINT1           (AV_CH_LAYOUT_SURROUND|AV_CH_LOW_FREQUENCY)
   91 #define AV_CH_LAYOUT_4POINT0           (AV_CH_LAYOUT_SURROUND|AV_CH_BACK_CENTER)
   92 #define AV_CH_LAYOUT_4POINT1           (AV_CH_LAYOUT_4POINT0|AV_CH_LOW_FREQUENCY)
   93 #define AV_CH_LAYOUT_2_2               (AV_CH_LAYOUT_STEREO|AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)
   94 #define AV_CH_LAYOUT_QUAD              (AV_CH_LAYOUT_STEREO|AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)
   95 #define AV_CH_LAYOUT_5POINT0           (AV_CH_LAYOUT_SURROUND|AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)
   96 #define AV_CH_LAYOUT_5POINT1           (AV_CH_LAYOUT_5POINT0|AV_CH_LOW_FREQUENCY)
   97 #define AV_CH_LAYOUT_5POINT0_BACK      (AV_CH_LAYOUT_SURROUND|AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)
   98 #define AV_CH_LAYOUT_5POINT1_BACK      (AV_CH_LAYOUT_5POINT0_BACK|AV_CH_LOW_FREQUENCY)
   99 #define AV_CH_LAYOUT_6POINT0           (AV_CH_LAYOUT_5POINT0|AV_CH_BACK_CENTER)
  100 #define AV_CH_LAYOUT_6POINT0_FRONT     (AV_CH_LAYOUT_2_2|AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)
  101 #define AV_CH_LAYOUT_HEXAGONAL         (AV_CH_LAYOUT_5POINT0_BACK|AV_CH_BACK_CENTER)
  102 #define AV_CH_LAYOUT_6POINT1           (AV_CH_LAYOUT_5POINT1|AV_CH_BACK_CENTER)
  103 #define AV_CH_LAYOUT_6POINT1_BACK      (AV_CH_LAYOUT_5POINT1_BACK|AV_CH_BACK_CENTER)
  104 #define AV_CH_LAYOUT_6POINT1_FRONT     (AV_CH_LAYOUT_6POINT0_FRONT|AV_CH_LOW_FREQUENCY)
  105 #define AV_CH_LAYOUT_7POINT0           (AV_CH_LAYOUT_5POINT0|AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)
  106 #define AV_CH_LAYOUT_7POINT0_FRONT     (AV_CH_LAYOUT_5POINT0|AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)
  107 #define AV_CH_LAYOUT_7POINT1           (AV_CH_LAYOUT_5POINT1|AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)
  108 #define AV_CH_LAYOUT_7POINT1_WIDE      (AV_CH_LAYOUT_5POINT1|AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)
  109 #define AV_CH_LAYOUT_7POINT1_WIDE_BACK (AV_CH_LAYOUT_5POINT1_BACK|AV_CH_FRONT_LEFT_OF_CENTER|AV_CH_FRONT_RIGHT_OF_CENTER)
  110 #define AV_CH_LAYOUT_OCTAGONAL         (AV_CH_LAYOUT_5POINT0|AV_CH_BACK_LEFT|AV_CH_BACK_CENTER|AV_CH_BACK_RIGHT)
  111 #define AV_CH_LAYOUT_HEXADECAGONAL     (AV_CH_LAYOUT_OCTAGONAL|AV_CH_WIDE_LEFT|AV_CH_WIDE_RIGHT|AV_CH_TOP_BACK_LEFT|AV_CH_TOP_BACK_RIGHT|AV_CH_TOP_BACK_CENTER|AV_CH_TOP_FRONT_CENTER|AV_CH_TOP_FRONT_LEFT|AV_CH_TOP_FRONT_RIGHT)
  112 #define AV_CH_LAYOUT_STEREO_DOWNMIX    (AV_CH_STEREO_LEFT|AV_CH_STEREO_RIGHT)

常用api

int64_t av_get_default_channel_layout(int nb_channels)

這個函數可以根據通道的個數獲得默認的channel_layout

int av_get_channel_layout_nb_channels(uint64_t channel_layout)

根據通道佈局獲得對應的通道數

int av_get_channel_layout_channel_index(uint64_t channel_layout,uint64_t channel);

獲得單通道在通道佈局中的下標，注意channel必須是單通道的，比如獲得AV_CH_BACK_CENTER在AV_CH_LAYOUT_4POINT0中的下標，可以得到結果爲3；
根據這個下標就可以取到對應通道的數據了

音頻解碼

這裏有個官方例子：FFmpeg: decode_audio.c
以下是跟音頻相關的常用參數

typedef struct AVCodecContext {

/* audio only */
int sample_rate; ///< samples per second
int channels;    ///< number of audio channels

/**
* audio sample format
* - encoding: Set by user.
* - decoding: Set by libavcodec.
*/
 enum AVSampleFormat sample_fmt;  ///< sample format

 /* The following data should not be initialized. */
 /**
 * Number of samples per channel in an audio frame.
 *
 * - encoding: set by libavcodec in avcodec_open2(). Each submitted frame
 *   except the last must contain exactly frame_size samples per channel.
 *   May be 0 when the codec has AV_CODEC_CAP_VARIABLE_FRAME_SIZE set, then the
 *   frame size is not restricted.
 * - decoding: may be set by some decoders to indicate constant frame size
 */
int frame_size;

/**
  * Audio cutoff bandwidth (0 means "automatic")
  * - encoding: Set by user.
  * - decoding: unused
   */
   int cutoff;
  
   /**
 * Audio channel layout.
 * - encoding: set by user.
 * - decoding: set by user, may be overwritten by libavcodec.
  */
       uint64_t channel_layout;
 /**
   * Request decoder to use this channel layout if it can (0 for default)
    * - encoding: unused
    * - decoding: Set by user.
    */
   uint64_t request_channel_layout;
    /**
    * Type of service that the audio stream conveys.
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    enum AVAudioServiceType audio_service_type;
   
   /**
    * desired sample format
   * - encoding: Not used.
   * - decoding: Set by user.
   * Decoder will decode to this format if it can.
   */
  enum AVSampleFormat request_sample_fmt;
}

其中frame_size的意思就是一個packet中的採樣數，比如採樣率是48000,frame_size=1152,則表示每秒有48000次採樣，而每個packet有1152次採樣，因此一個packet的時間是1152/48000 * 1000 = 24毫秒
channel_layout是聲道佈局，表示多聲道的個數和順序，有了這個順序才能順利取到需要的數據

libswresample

libswresample主要是用於音頻的重採樣和格式轉換的,包含如下功能：
- 採樣頻率轉換：對音頻的採樣頻率進行轉換的處理，例如把音頻從一個高的44100Hz的採樣頻率轉換到8000Hz；從高採樣頻率到低採樣頻率的音頻轉換是一個有損的過程
- 聲道格式轉換：對音頻的聲道格式進行轉換的處理，例如立體聲轉換爲單聲道；當輸入通道不能映射到輸出流時，這個過程是有損的，因爲它涉及不同的增益因素和混合。
- 採樣格式轉換：對音頻的樣本格式進行轉換的處理，例如把s16的PCM數據轉換爲s8格式或者f32的PCM數據；此外提供了Packed和Planar包裝格式之間相互轉換的功能
當音頻的採樣率與播放器的採樣率不一致時，那麼想在播放器正常播放，就需要對音頻進行重採樣，否則可能會出現音頻變速的問題

音頻轉換

音頻轉換一般就是指planar和packed的互轉，或者聲道之間的轉換
當我們解碼後的音頻數據是planar的，而我們的播放器卻只支持packed的，那麼我們就需要將planar轉爲packed,例如是雙聲道的話就是要將原本爲LLLLRRRR的數據變爲LRLRLRLR,知道這個原理後，其實兩個for循環就能搞定這次轉換，如下

 data_size = av_get_bytes_per_sample(dec_ctx->sample_fmt);
 for (i = 0; i < frame->nb_samples; i++)
     for (ch = 0; ch < dec_ctx->channels; ch++)
         fwrite(frame->data[ch] + data_size*i, 1, data_size, outfile);

以上只是爲了更好的理解轉換過程，實際上ffmpeg已經提供了相關的接口來幫助我們轉換，如非特別需求，建議還是使用ffmpeg提供的轉換接口
在ffmpeg中，轉換主要包含3個步驟：
1 實例化SwrContext
2 計算轉換後的sample個數
3 調用 swr_convert進行轉換

轉換的參考代碼如下，主要的api都在libswresample/swresample.h

uint8_t **input;
int in_samples;

//第一種方式創建SwrContext
//SwrContext *swr = swr_alloc();
// av_opt_set_channel_layout(swr, "in_channel_layout",  AV_CH_LAYOUT_5POINT1, 0);
// av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO,  0);
// av_opt_set_int(swr, "in_sample_rate",     48000,                0);
// av_opt_set_int(swr, "out_sample_rate",    44100,                0);
// av_opt_set_sample_fmt(swr, "in_sample_fmt",  AV_SAMPLE_FMT_FLTP, 0);
// av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S32,  0);

//第二種方式創建SwrContext，以下代碼作用等同於上面的
SwrContext *swr = swr_alloc_set_opts(NULL,  // we're allocating a new context
                         AV_CH_LAYOUT_STEREO,  // out_ch_layout
                         AV_SAMPLE_FMT_S32,    // out_sample_fmt
                         44100,                // out_sample_rate
                         AV_CH_LAYOUT_5POINT1, // in_ch_layout
                         AV_SAMPLE_FMT_FLTP,   // in_sample_fmt
                         48000,                // in_sample_rate
                         0,                    // log_offset
                         NULL);                // log_ctx


//在得到SwrContext後就要進行初始化 ，如果SwrContext的參數有任何變化，則必須再次調用以下初始化函數
swr_init(swr)；

//這裏演示修改了第三個參數爲AV_SAMPLE_FMT_S16，則需要再次調用swr_init
swr = swr_alloc_set_opts(swr,  
                         AV_CH_LAYOUT_STEREO,  // out_ch_layout
                         AV_SAMPLE_FMT_S16,    // out_sample_fmt
                         44100,                // out_sample_rate
                         AV_CH_LAYOUT_5POINT1, // in_ch_layout
                         AV_SAMPLE_FMT_FLTP,   // in_sample_fmt
                         48000,                // in_sample_rate
                         0,                    // log_offset
                         NULL);                // log_ctx
                         
swr_init(swr)；//再次調用

//計算轉換後的採樣數samples,計算公式爲 in_samples*out_sample_rate=out_samples*in_sample_rate
//該運算在數學上等價於a * b / c，最後一個參數可以支持多種取捨
int out_samples = av_rescale_rnd(swr_get_delay(swr, //獲取下一個輸入樣本相對於下一個輸出樣本將經歷的延遲
												48000)//輸入採樣率 
									+in_samples, 
                                     44100, //輸出採樣率
                                     48000, //輸入採樣率
                                     AV_ROUND_UP);//表示向上取整，如3/2=2

//根據轉換後的音頻參數分配一塊緩衝來存儲數據
uint8_t *  output[8];//用於存儲轉換後的數據
//分配一個樣本緩衝區，並相應地填充數據指針和行大小
//可以使用av_freep（＆output [0]）釋放分配的樣本緩衝區
  av_samples_alloc(&output,//[out]
    				 NULL, //[out]
    				 2, //通道數
    				 out_samples,//採樣數
                     AV_SAMPLE_FMT_S16, //採樣格式
                     0);//對齊，0--默認，1--不對齊

 out_samples = swr_convert(swr, 
   						      &output, //轉換後的數據
   						      out_samples,
                              input, //要轉換的數據
                              in_samples);

if(swr_get_out_samples(swr,0)>0){//表示有緩衝數據
//通過設置in和in_count爲0將緩存中的全部處理完畢，這通常是最後一步，如果沒有這步，則可能最後的音頻數據會存在緩衝中沒有全部轉換出來
  out_samples = swr_convert(swr, 
   						      &output, 
   						      out_samples,
                              NULL, 
                              0);
}
                             
swr_free(&swr)；//最後釋放

swr_convert()在轉換過程中如果輸入採樣數大於輸出採樣數，那麼超出的部分會被Swresample緩存起來，因此輸出採樣數這個參數要根據輸入採樣數和已經存在的緩存進行計算，否則可能會導致緩存的採樣數越來越多，內存一直在上漲；當in和in_count都爲0時，就表示要把緩存中數據都輸出出來了
swr_get_out_samples()函數的意思是獲得下一個輸出樣本緩衝的數量，相同的輸入返回值並不是一樣的，這取決於內部的緩存採樣數的多少；我們知道，swr_convert在調用後，如果輸入的採樣數比輸出的採樣數大，那麼Swresample便會對超出的那部分進行緩存，如果輸入一直比輸出大，那麼內存就會一直上漲，爲此，我們需要swr_get_out_samples這個函數讓我們得知輸出應該爲多大才能把緩衝裏的數據也帶走，比如Swresample裏已經有10個採樣數的緩存了，此時輸入如果爲100個採樣數，那麼我們希望輸出爲110（這是在不改變採樣率的情況下），這樣就能把所有數據都輸出，緩存也清空了，就不會引起內存上漲，那麼怎麼得到這個110呢，通過swr_get_out_samples（swr,100）=110；如果swr_get_out_samples（swr,0）就表示獲得Swresample已經緩存的採樣數；最後總結一下，swr_get_out_samples就是根據你的輸入採樣數，得到應該取走的輸出採樣數，如果輸入採樣數爲0，那麼就能得到已經緩存的採樣數，得到這個輸出採樣數後，我們才知道應該通過av_samples_alloc爲輸出樣本分配多大的緩存空間
swr_get_delay()這個暫時也沒搞懂

音頻重採樣

https://blog.csdn.net/eydwyz/article/details/78748312 ffmpeg解碼音頻數據時，進行重採樣（即改變文件原有的採樣率）_Python_eydwyz的專欄-CSDN博客

參考

https://blog.csdn.net/qq_18998145/article/details/97394595 ffmpeg音頻存儲格式packed和planar_LIEY-CSDN博客
https://www.cnblogs.com/wangguchangqing/p/5851490.html FFmpeg學習4：音頻格式轉換 - Brook_icv - 博客園
https://blog.csdn.net/eydwyz/article/details/78748241 (25條消息)FFmpeg關於nb_smples,frame_size以及profile的解釋_eydwyz的專欄-CSDN博客
https://blog.csdn.net/eydwyz/article/details/78748312 ffmpeg解碼音頻數據時，進行重採樣（即改變文件原有的採樣率）_Python_eydwyz的專欄-CSDN博客
https://www.jianshu.com/p/bf5e54f553a4 FFmpeg音頻重採樣API(libswresample) - 簡書

ffmpeg簡單分析系列----音頻（audio）

文章目錄

ffmpeg簡單分析系列----音頻（audio）

採樣格式

通道佈局（channel layout）

常用api

int64_t av_get_default_channel_layout(int nb_channels)

int av_get_channel_layout_nb_channels(uint64_t channel_layout)

int av_get_channel_layout_channel_index(uint64_t channel_layout,uint64_t channel);

音頻解碼

libswresample

音頻轉換

音頻重採樣

相關計算

參考

如何在低代碼平臺中引用 JavaScript ？

探究職業發展的關鍵：能力模型解讀

高效率使用windows

如何使用 JavaScript 獲取當前頁面幀率 FPS

工程款拖欠，農民工怎麼了？就得一直忍着委屈求全嗎？

HarmonyOS 實現下拉刷新，上拉加載更多

語音信號處理中的“窗函數”

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

代碼庫上傳遠程倉庫利器（maven-pubish/maven）

用最簡單的例子幫助理解TCP的三次握手和四次揮手

視頻解碼之YUV類型

windows下搭建websocketpp開發環境以及前端處理二進制流

利用docker鏡像搭建流媒體服務進行rtmp推流及點流（rtmp,hls,http-flv）測試

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結