rtmp協議發送mp3和aac裸流的方法

Overview

Flash Video(簡稱FLV),是一種流行的音視頻流媒體封裝格式。目前國內外大部分視頻分享網站都是採用的這種格式.

rtmp協議是adobe公司定製的，用於傳輸音視頻的協議。

flv文件概述

從整個文件上開看,FLV是由The FLV header 和 The FLV File Body 組成.

1.The FLV header

Field	Type	Comment
Signature	UI8	Signature byte always 'F' (0x46)
Signature	UI8	Signature byte always 'L' (0x4C)
Signature	UI8	Signature byte always 'V' (0x56)
Version	UI8	File version (for example, 0x01 for FLV version 1)
TypeFlagsReserved	UB [5]	Shall be 0
TypeFlagsAudio	UB [1]	1 = Audio tags are present
TypeFlagsReserved	UB [1]	Shall be 0
TypeFlagsVideo	UB [1]	1 = Video tags are present
DataOffset	UI32	The length of this header in bytes

Signature: FLV 文件的前3個字節爲固定的‘F’‘L’‘V’,用來標識這個文件是flv格式,ffmpeg在做格式探測的時候，

如果發現前3個字節爲“FLV”，就認爲它是flv格式。

Version: 第4個字節表示flv版本號.

Flags: 第5個字節中的第0位和第2位,分別表示 video 與 audio 存在的情況.(1表示存在,0表示不存在)

DataOffset : 最後4個字節表示FLV header 長度.

2.The FLV File Body

Field	Type	Comment
PreviousTagSize0	UI32	Always 0
Tag1	FLVTAG	First tag
PreviousTagSize1	UI32	Size of previous tag, including its header, in bytes. For FLV version1, this value is 11 plus the DataSize of the previous tag.
Tag2	FLVTAG	Second tag
...	...	...
PreviousTagSizeN-1	UI32	Size of second-to-last tag, including its header, in bytes.
TagN	FLVTAG	Last tag
PreviousTagSizeN	UI32	Size of last tag, including its header, in bytes

FLV header之後,就是 FLV File Body.

FLV File Body是由一連串的back-pointers + tags構成.back-pointers就是4個字節數據,表示前一個tag的size.

FLV Tag Definition

FLV文件中的數據都是由一個個TAG組成,TAG裏面的數據可能是video、audio、scripts.

下表是TAG的結構:

1.FLVTAG

Field	Type	Comment
Reserved	UB [2]	Reserved for FMS, should be 0
Filter	UB [1]	Indicates if packets are filtered. 0 = No pre-processing required. 1 = Pre-processing (such as decryption) of the packet is required before it can be rendered. Shall be 0 in unencrypted files, and 1 for encrypted tags. See Annex F. FLV Encryption for the use of filters.
TagType	UB [5]	Type of contents in this tag. The following types are defined: 8 = audio 9 = video 18 = script data
DataSize	UI24	Length of the message. Number of bytes after StreamID to end of tag (Equal to length of the tag – 11)
Timestamp	UI24	Time in milliseconds at which the data in this tag applies. This value is relative to the first tag in the FLV file, which always has a timestamp of 0.
TimestampExtended	UI8	Extension of the Timestamp field to form a SI32 value. This field represents the upper 8 bits, while the previous Timestamp field represents the lower 24 bits of the time in milliseconds.
StreamID	UI24	Always 0.
AudioTagHeader	IF TagType == 8 AudioTagHeader
VideoTagHeader	IF TagType == 9 VideoTagHeader
EncryptionHeader	IF Filter == 1 EncryptionTagHeader
FilterParams	IF Filter == 1 FilterParams
Data	IF TagType == 8 AUDIODATA IF TagType == 9 VIDEODATA IF TagType == 18 SCRIPTDATA	Data specific for each media type.

TagType: TAG中第1個字節中的前5位表示這個TAG中包含數據的類型,8 = audio,9 = video,18 = script data.

DataSize:StreamID之後的數據長度.

Timestamp和TimestampExtended組成了這個TAG 包數據的PTS信息,記得剛開始做FVL demux的時候，並沒有考慮TimestampExtended的值,直接就把Timestamp默認爲是PTS，後來發生的現象就是畫面有跳幀的現象,後來才仔細看了一下文檔發現真正數據的PTS是PTS= Timestamp | TimestampExtended<<24.

StreamID之後的數據就是每種格式的情況不一樣了，接下格式進行詳細的介紹.

Audio Tags

如果TAG包中的TagType==8時，就表示這個TAG是audio。

StreamID之後的數據就表示是AudioTagHeader，AudioTagHeader結構如下：

Field	Type	Comment
SoundFormat	UB [4]	Format of SoundData. The following values are defined: 0 = Linear PCM, platform endian 1 = ADPCM 2 = MP3 3 = Linear PCM, little endian 4 = Nellymoser 16 kHz mono 5 = Nellymoser 8 kHz mono 6 = Nellymoser 7 = G.711 A-law logarithmic PCM 8 = G.711 mu-law logarithmic PCM 9 = reserved 10 = AAC 11 = Speex 14 = MP3 8 kHz 15 = Device-specific sound Formats 7, 8, 14, and 15 are reserved. AAC is supported in Flash Player 9,0,115,0 and higher. Speex is supported in Flash Player 10 and higher.
SoundRate	UB [2]	Sampling rate. The following values are defined: 0 = 5.5 kHz 1 = 11 kHz 2 = 22 kHz 3 = 44 kHz
SoundSize	UB [1]	Size of each audio sample. This parameter only pertains to uncompressed formats. Compressed formats always decode to 16 bits internally. 0 = 8-bit samples 1 = 16-bit samples
SoundType	UB [1]	Mono or stereo sound 0 = Mono sound 1 = Stereo sound
AACPacketType	IF SoundFormat == 10 UI8	The following values are defined: 0 = AAC sequence header 1 = AAC raw

AudioTagHeader的頭1個字節，也就是接跟着StreamID的1個字節包含着音頻類型、採樣率等的基本信息.表裏列的十分清楚.

AudioTagHeader之後跟着的就是AUDIODATA數據了，也就是audio payload 但是這裏有個特例，如果音頻格式（SoundFormat）是10 = AAC，AudioTagHeader中會多出1個字節的數據AACPacketType，這個字段來表示AACAUDIODATA的類型：0 = AAC sequence header，1 = AAC raw。

Field	Type	Comment
Data	IF AACPacketType ==0 AudioSpecificConfig	The AudioSpecificConfig is defined in ISO14496-3. Note that this is not the same as the contents of the esds box from an MP4/F4V file.
	ELSE IF AACPacketType == 1 Raw AAC frame data in UI8 [ ]	audio payload

AAC sequence header存放的是AudioSpecificConfig結構，該結構則在“ISO-14496-3 Audio”中描述。AudioSpecificConfig結構的描述非常複雜，這裏我做一下簡化，事先設定要將要編碼的音頻格式，其中，選擇"AAC-LC"爲音頻編碼，音頻採樣率爲44100，於是AudioSpecificConfig簡化爲下表

。而且在ffmpeg中有對AudioSpecificConfig解析的函數，ff_mpeg4audio_get_config(),可以對比的看一下，理解更深刻。

AAC raw 這種包含的就是音頻ES流了，也就是audio payload.

在FLV的文件中，一般情況下 AAC sequence header 這種包只出現1次，而且是第一個audio tag，爲什麼要提到這種tag，因爲當時在做FLVdemux的時候，如果是AAC的音頻，需要在每幀AAC ES流前邊添加7個字節ADTS頭,ADTS在ADTS音頻的格式中會詳細解讀，這是解碼器通用的格式，就是AAC的純ES流要打包成ADTS格式的AAC文件，解碼器才能正常播放.就是在打包ADST的時候，需要samplingFrequencyIndex這個信息，samplingFrequencyIndex最準確的信息是在AudioSpecificConfig中，所以就對AudioSpecificConfig進行解析並得到了samplingFrequencyIndex。

到這步你就完全可以把FLV 文件中的音頻信息及數據提取出來，送給音頻解碼器正常播放了。

Video Tags

如果TAG包中的TagType==9時，就表示這個TAG是video.

StreamID之後的數據就表示是VideoTagHeader，VideoTagHeader結構如下：

Field	Type	Comment
Frame Type	UB [4]	Type of video frame. The following values are defined: 1 = key frame (for AVC, a seekable frame) 2 = inter frame (for AVC, a non-seekable frame) 3 = disposable inter frame (H.263 only) 4 = generated key frame (reserved for server use only) 5 = video info/command frame
CodecID	UB [4]	Codec Identifier. The following values are defined: 2 = Sorenson H.263 3 = Screen video 4 = On2 VP6 5 = On2 VP6 with alpha channel 6 = Screen video version 2 7 = AVC
AVCPacketType	IF CodecID == 7 UI8	The following values are defined: 0 = AVC sequence header 1 = AVC NALU 2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)
CompositionTime	IF CodecID == 7 SI24	IF AVCPacketType == 1 Composition time offset ELSE 0 See ISO 14496-12, 8.15.3 for an explanation of composition times. The offset in an FLV file is always in milliseconds.

VideoTagHeader的頭1個字節，也就是接跟着StreamID的1個字節包含着視頻幀類型及視頻CodecID最基本信息.表裏列的十分清楚.

VideoTagHeader之後跟着的就是VIDEODATA數據了，也就是video payload.當然就像音頻AAC一樣，這裏也有特例就是如果視頻的格式是AVC（H.264）的話，VideoTagHeader會多出4個字節的信息.

AVCPacketType 和 CompositionTime。AVCPacketType 表示接下來 VIDEODATA （AVCVIDEOPACKET）的內容：

IF AVCPacketType == 0 AVCDecoderConfigurationRecord（AVC sequence header）
IF AVCPacketType == 1 One or more NALUs (Full frames are required)

AVCDecoderConfigurationRecord.包含着是H.264解碼相關比較重要的sps和pps信息，再給AVC解碼器送數據流之前一定要把sps和pps信息送出，否則的話解碼器不能正常解碼。而且在解碼器stop之後再次start之前，如seek、快進快退狀態切換等，都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情況也是出現1次，也就是第一個 video tag.

AVCDecoderConfigurationRecord的定義在ISO 14496-15, 5.2.4.1中，這裏不在詳細貼，

SCRIPTDATA

如果TAG包中的TagType==18時，就表示這個TAG是SCRIPT.

SCRIPTDATA 結構十分複雜，定義了很多格式類型，每個類型對應一種結構.

Field	Type	Comment
Type	UI8	Type of the ScriptDataValue. The following types are defined: 0 = Number 1 = Boolean 2 = String 3 = Object 4 = MovieClip (reserved, not supported) 5 = Null 6 = Undefined 7 = Reference 8 = ECMA array 9 = Object end marker 10 = Strict array 11 = Date 12 = Long string
ScriptDataValue	IF Type == 0 DOUBLE IF Type == 1 UI8 IF Type == 2 SCRIPTDATASTRING IF Type == 3 SCRIPTDATAOBJECT IF Type == 7 UI16 IF Type == 8 SCRIPTDATAECMAARRAY IF Type == 10 SCRIPTDATASTRICTARRAY IF Type == 11 SCRIPTDATADATE IF Type == 12 SCRIPTDATALONGSTRING	Script data value. The Boolean value is (ScriptDataValue ≠ 0).

類型在FLV的官方文檔中都有詳細介紹.

onMetaData

onMetaData 是SCRIPTDATA中對我們來說十分重要的信息，結構如下表：

Property Name	Type	Comment
audiocodecid	Number	Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values)
audiodatarate	Number	Audio bit rate in kilobits per second
audiodelay	Number	Delay introduced by the audio codec in seconds
audiosamplerate	Number	Frequency at which the audio stream is replayed
audiosamplesize	Number	Resolution of a single audio sample
canSeekToEnd	Boolean	Indicating the last video frame is a key frame
creationdate	String	Creation date and time
duration	Number	Total duration of the file in seconds
filesize	Number	Total size of the file in bytes
framerate	Number	Number of frames per second
height	Number	Height of the video in pixels
stereo	Boolean	Indicating stereo audio
videocodecid	Number	Video codec ID used in the file (see E.4.3.1 for available CodecID values)
videodatarate	Number	Video bit rate in kilobits per second
width	Number	Width of the video in pixels

這裏面的duration、filesize、視頻的width、height等這些信息對我們來說很有用.

keyframes

當時在做flv demux的時候，發現官方的文檔中並沒有對keyframes index做描述，但是flv的這種結構每個tag又不像TS有同步頭，如果沒有keyframes index 的話，seek及快進快退的效果會非常差，因爲需要一個tag一個tag的順序讀取。後來通過網絡查一些資料，發現了一個keyframes的信息藏在SCRIPTDATA中。

keyframes幾乎是一個非官方的標準，也就是民間標準.在網上已經很難看到flv文件格式，但是metadata裏面不包含 keyframes項目的視頻 . 兩個常用的操作metadata的工具是flvtool2和FLVMDI，都是把keyframes作爲一個默認的元信息項目.在FLVMDI的主頁 (http://www.buraks.com/flvmdi/)上有描述：

keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

也就是說keyframes中包含着2個內容 'filepositions' and 'times'分別指的是關鍵幀的文件位置和關鍵幀的PTS.通過keyframes可以建立起自己的Index，然後再seek和快進快退的操作中，快速有效的跳轉到你想要找的關鍵幀的位置進行處理。

rtmp協議發送mp3和aac裸流的方法

Overview

flv文件概述

1.The FLV header

2.The FLV File Body

FLV Tag Definition

1.FLVTAG

Audio Tags

Video Tags

SCRIPTDATA

onMetaData

keyframes

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

科目三考試

Rtmp aac抓包分析

2017年7月6號生活感悟

rtmp協議發送mp3和aac裸流的方法

Linux pulse_audio音頻輸出demo

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結