快速理解AAC編碼格式

本文包含兩部分內容：介紹AAC編碼格式，以及如何解決ffmpeg獲取aac文件時長不準的問題。

寫在前面：想要自己詳細學習AAC編碼格式細節的朋友們當然更推薦直接去看標準文檔《MPEG-4 Audio: ISO/IEC 14496-3:2009》

AAC格式介紹

首先需要了解的是AAC文件格式有ADIF和ADTS兩種，其中ADIF（Audio Data Interchange Format 音頻數據交換格式）的特徵是解碼必須在明確定義的開始處進行，不能從數據流中間開始；而ADTS（Audio Data Transport Stream 音頻數據傳輸流）則相反，這種格式的特徵是有同步字，解碼可以在這個流中任何位置開始，正如它的名字一樣，這是一種和TS流類似的格式。

ADTS格式中每一幀都有頭信息，具備流特徵，適合於網絡傳輸與處理，而ADIF只有一個統一的頭，並且這兩種格式的header格式也是不同的。目前主流使用的都是ADTS格式，本文也將以ADTS爲重點進行介紹。

ADTS AAC文件格式如下


ADTS_header	AAC ES	ADTS_header	AAC ES	…	ADTS_header	AAC ES

可以看到每一幀都有頭信息，即ADTS_header，其中包含採樣率、聲道數、幀長度等信息。一般ADTS頭信息都是7字節，如果有CRC則爲9字節。

ADTS幀首部結構如下

序號	域	長度（bits）	說明
1	Syncword	12	all bits must be 1
2	MPEG version	1	0 for MPEG-4, 1 for MPEG-2
3	Layer	2	always 0
4	Protection Absent	1	set to 1 if there is no CRC and 0 if there is CRC
5	Profile	2	the MPEG-4 Audio Object Type minus 1
6	MPEG-4 Sampling Frequency Index	4	MPEG-4 Sampling Frequency Index (15 is forbidden)
7	Private Stream	1	set to 0 when encoding, ignore when decoding
8	MPEG-4 Channel Configuration	3	MPEG-4 Channel Configuration (in the case of 0, the channel configuration is sent via an inband PCE)
9	Originality	1	set to 0 when encoding, ignore when decoding
10	Home	1	set to 0 when encoding, ignore when decoding
11	Copyrighted Stream	1	set to 0 when encoding, ignore when decoding
12	Copyrighted Start	1	set to 0 when encoding, ignore when decoding
13	Frame Length	13	this value must include 7 or 9 bytes of header length: FrameLength = (ProtectionAbsent == 1 ? 7 : 9) + size(AACFrame)
14	Buffer Fullness	11	buffer fullness
15	Number of AAC Frames	2	number of AAC frames (RDBs) in ADTS frame minus 1, for maximum compatibility always use 1 AAC frame per ADTS frame
16	CRC	16	CRC if protection absent is 0

下面來說一下幾個關鍵字段的含義

profile

很多文章都說一個aac幀包含1024個sample，其實這是錯誤的。正確的說法是不同profile決定了每個aac幀含有多少個sample，具體來說，對應關係如下

PROFILE	SAMPLES
HE-AAC v1/v2	2048
AAC-LC	1024
AAC-LD/AAC-ELD	480/512

所謂LC即Low Complexity，HE即High Efficiency。

那麼profile的取值含義又如何呢？其實在ffmpeg源碼中已經有了詳細的對應關係，在libavcodec/avcodec.h中，如下

#define FF_PROFILE_AAC_MAIN 0
#define FF_PROFILE_AAC_LOW  1
#define FF_PROFILE_AAC_SSR  2
#define FF_PROFILE_AAC_LTP  3
#define FF_PROFILE_AAC_HE   4
#define FF_PROFILE_AAC_HE_V2 28
#define FF_PROFILE_AAC_LD   22
#define FF_PROFILE_AAC_ELD  38
#define FF_PROFILE_MPEG2_AAC_LOW 128
#define FF_PROFILE_MPEG2_AAC_HE  131

所以當我們讀到profile爲28時，就知道這是一個he-aac v2的aac文件了，其每幀sample數目爲2048.

Sampling Frequency Index

表示使用的採樣率下標，通過這個下標在 Sampling Frequencies[ ]數組中查找得知採樣率的值，對應關係如下：


0	96000 Hz
1	88200 Hz
2	64000 Hz
3	48000 Hz
4	44100 Hz
5	32000 Hz
6	24000 Hz
7	22050 Hz
8	16000 Hz
9	12000 Hz
10	11025 Hz
11	8000 Hz
12	7350 Hz
13	Reserved
14	Reserved
15	frequency is written explictly

Channel Configuration

這個就簡單了，就是聲道數，對應關係如下
0: Defined in AOT Specifc Config
1: 1 channel: front-center
2: 2 channels: front-left, front-right
3: 3 channels: front-center, front-left, front-right
4: 4 channels: front-center, front-left, front-right, back-center
5: 5 channels: front-center, front-left, front-right, back-left, back-right
6: 6 channels: front-center, front-left, front-right, back-left, back-right, LFE-channel
7: 8 channels: front-center, front-left, front-right, side-left, side-right, back-left, back-right, LFE-channel
8-15: Reserved

解決ffmpeg解析aac文件時長不準確的問題

當我們想要利用ffmpeg去獲取一個aac文件時長的時候，會發現ffmpeg輸出了這麼一行warning信息：

[aac @ 000000529d929ec0] Estimating duration from bitrate, this may be inaccurate

在前面的小節中我們看到，aac文件格式中並沒有明確的duration信息，因此ffmpeg選擇通過filesize / bitrate來估算時長。

我們閉着眼睛想也知道這種估算方法是不準確的，事實也是如此，而且特別有意思的是，ffmpeg 3.0之前和之後的版本估算出來的結果還不一樣，所以開發人員也很貼心的給我們輸出了這麼一行warning信息。

估算時長這一部分邏輯的代碼位於libavformat/utils.c#estimate_timings中

if ((!strcmp(ic->iformat->name, "mpeg") ||
         !strcmp(ic->iformat->name, "mpegts")) &&
        file_size && (ic->pb->seekable & AVIO_SEEKABLE_NORMAL)) {
        /* get accurate estimate from the PTSes */
        estimate_timings_from_pts(ic, old_offset);
        ic->duration_estimation_method = AVFMT_DURATION_FROM_PTS;
    } else if (has_duration(ic)) {
        /* at least one component has timings - we use them for all
         * the components */
        fill_all_stream_timings(ic);
        /* nut demuxer estimate the duration from PTS */
        if(!strcmp(ic->iformat->name, "nut"))
            ic->duration_estimation_method = AVFMT_DURATION_FROM_PTS;
        else
            ic->duration_estimation_method = AVFMT_DURATION_FROM_STREAM;
    } else {
    	//aac文件解析時長的時候就會走到這裏
        /* less precise: use bitrate info */
        estimate_timings_from_bit_rate(ic);
        ic->duration_estimation_method = AVFMT_DURATION_FROM_BITRATE;
    }

那麼如何才能獲取準確的時長呢？通過上一節的介紹，相信大家都能想到，應該是通過adts frame header取總幀數*每幀時長的值作爲duration。

具體來說，獲取總幀數就是把整個文件“過”一遍，僞代碼如下

while (offset < total_size) {
        frame_length = get_adts_frame_length(offset)
        offset += frame_length;
        num_frames++;
    }

而獲取每幀時長就更簡單了：ffmpeg能正確讀到每幀的nb_samples和總體的sample_rate，那麼兩者相除就是每幀的時長了。

完整的解析時長代碼可以關注我的公衆號灰度五十，回覆“解析aac時長”獲取~

refs
https://wiki.multimedia.cx/index.php?title=MPEG-4_Audio

快速理解AAC編碼格式

AAC格式介紹

profile

Sampling Frequency Index

Channel Configuration

解決ffmpeg解析aac文件時長不準確的問題

快速理解AAC編碼格式

ffmpeg綜合應用示例（三）——安卓手機攝像頭編碼

WebRTC VoiceEngine綜合應用示例（一）——基本結構分析

WebRTC VoiceEngine綜合應用示例（二）——音頻通話的基本流程

ffmpeg綜合應用示例（四）——攝像頭直播的視音頻同步

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結