rtmp協議發送mp3和aac裸流的方法

Overview

Flash Video(簡稱FLV),是一種流行的音視頻流媒體封裝格式。目前國內外大部分視頻分享網站都是採用的這種格式.

rtmp協議是adobe公司定製的,用於傳輸音視頻的協議。

flv文件概述

從整個文件上開看,FLV是由The FLV header 和 The FLV File Body 組成.

1.The FLV header

Field Type Comment
Signature UI8 Signature byte always 'F' (0x46)
Signature UI8 Signature byte always 'L' (0x4C)
Signature UI8 Signature byte always 'V' (0x56)
Version UI8 File version (for example, 0x01 for FLV version 1)
TypeFlagsReserved UB [5] Shall be 0
TypeFlagsAudio UB [1] 1 = Audio tags are present
TypeFlagsReserved UB [1] Shall be 0
TypeFlagsVideo UB [1] 1 = Video tags are present
DataOffset UI32 The length of this header in bytes

 

Signature: FLV 文件的前3個字節爲固定的‘F’‘L’‘V’,用來標識這個文件是flv格式,ffmpeg在做格式探測的時候,

如果發現前3個字節爲“FLV”,就認爲它是flv格式。

Version: 第4個字節表示flv版本號.

Flags: 第5個字節中的第0位和第2位,分別表示 video 與 audio 存在的情況.(1表示存在,0表示不存在)

DataOffset : 最後4個字節表示FLV header 長度.

2.The FLV File Body

Field Type Comment
PreviousTagSize0 UI32 Always 0
Tag1 FLVTAG First tag
PreviousTagSize1 UI32

Size of previous tag, including its header, in bytes. For FLV version1,

this value is 11 plus the DataSize of the previous tag.

Tag2 FLVTAG Second tag
... ... ...
PreviousTagSizeN-1 UI32 Size of second-to-last tag, including its header, in bytes.
TagN FLVTAG Last tag
PreviousTagSizeN UI32 Size of last tag, including its header, in bytes

 

FLV header之後,就是 FLV File Body.

FLV File Body是由一連串的back-pointers + tags構成.back-pointers就是4個字節數據,表示前一個tag的size.

 

FLV Tag Definition

FLV文件中的數據都是由一個個TAG組成,TAG裏面的數據可能是video、audio、scripts.

下表是TAG的結構:

1.FLVTAG

Field Type Comment
Reserved UB [2] Reserved for FMS, should be 0
Filter UB [1] Indicates if packets are filtered.
0 = No pre-processing required.
1 = Pre-processing (such as decryption) of the packet is
required before it can be rendered.
Shall be 0 in unencrypted files, and 1 for encrypted tags.
See Annex F. FLV Encryption for the use of filters.
TagType UB [5]

Type of contents in this tag. The following types are
defined:
8 = audio
9 = video
18 = script data

DataSize UI24 Length of the message. Number of bytes after StreamID to
end of tag (Equal to length of the tag – 11)
Timestamp UI24 Time in milliseconds at which the data in this tag applies.
This value is relative to the first tag in the FLV file, which
always has a timestamp of 0.
TimestampExtended UI8 Extension of the Timestamp field to form a SI32 value. This
field represents the upper 8 bits, while the previous
Timestamp field represents the lower 24 bits of the time in
milliseconds.
StreamID UI24 Always 0.
AudioTagHeader IF TagType == 8
AudioTagHeader
 
VideoTagHeader IF TagType == 9
VideoTagHeader
 
EncryptionHeader IF Filter == 1
EncryptionTagHeader
 
FilterParams IF Filter == 1
FilterParams
 
Data IF TagType == 8
AUDIODATA
IF TagType == 9
VIDEODATA
IF TagType == 18
SCRIPTDATA
Data specific for each media type.

 

TagType: TAG中第1個字節中的前5位表示這個TAG中包含數據的類型,8 = audio,9 = video,18 = script data.

DataSize:StreamID之後的數據長度.

TimestampTimestampExtended組成了這個TAG 包數據的PTS信息,記得剛開始做FVL demux的時候,並沒有考慮TimestampExtended的值,直接就把Timestamp默認爲是PTS,後來發生的現 象就是畫面有跳幀的現象,後來才仔細看了一下文檔發現真正數據的PTS是PTS= Timestamp | TimestampExtended<<24.

StreamID之後的數據就是每種格式的情況不一樣了,接下格式進行詳細的介紹.

Audio Tags

如果TAG包中的TagType==8時,就表示這個TAG是audio。

StreamID之後的數據就表示是AudioTagHeaderAudioTagHeader結構如下:

Field Type Comment
SoundFormat UB [4] Format of SoundData. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16 kHz mono
5 = Nellymoser 8 kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
11 = Speex
14 = MP3 8 kHz
15 = Device-specific sound
Formats 7, 8, 14, and 15 are reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player 10 and higher.
SoundRate UB [2] Sampling rate. The following values are defined:
0 = 5.5 kHz
1 = 11 kHz
2 = 22 kHz
3 = 44 kHz
SoundSize UB [1]

Size of each audio sample. This parameter only pertains to
uncompressed formats. Compressed formats always decode
to 16 bits internally.
0 = 8-bit samples
1 = 16-bit samples

SoundType UB [1] Mono or stereo sound
0 = Mono sound
1 = Stereo sound
AACPacketType IF SoundFormat == 10
UI8
The following values are defined:
0 = AAC sequence header
1 = AAC raw

 

AudioTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着音頻類型、採樣率等的基本信息.表裏列的十分清楚.

AudioTagHeader之後跟着的就是AUDIODATA數據了,也就是audio payload 但是這裏有個特例,如果音頻格式(SoundFormat)是10 = AAC,AudioTagHeader中會多出1個字節的數據AACPacketType,這個字段來表示AACAUDIODATA的類型:0 = AAC sequence header,1 = AAC raw。

Field Type Comment
Data

IF AACPacketType ==0 AudioSpecificConfig

The AudioSpecificConfig is defined in ISO14496-3. Note that this is not the same as the contents of the esds box from an MP4/F4V file.

 

ELSE IF AACPacketType == 1 Raw AAC frame data in UI8 [ ]

audio payload

AAC sequence header存放的是AudioSpecificConfig結構,該結構則在“ISO-14496-3 Audio”中描述。AudioSpecificConfig結構的描述非常複雜,這裏我做一下簡化,事先設定要將要編碼的音頻格式,其中,選擇"AAC-LC"爲音頻編碼,音頻採樣率爲44100,於是AudioSpecificConfig簡化爲下表

。而且在ffmpeg中有對AudioSpecificConfig解析的函數,ff_mpeg4audio_get_config(),可以對比的看一下,理解更深刻。

AAC raw 這種包含的就是音頻ES流了,也就是audio payload.

在FLV的文件中,一般情況下 AAC sequence header 這種包只出現1次,而且是第一個audio tag,爲什麼要提到這種tag,因爲當時在做FLVdemux的時候,如果是AAC的音頻,需要在每幀AAC ES流前邊添加7個字節ADTS頭,ADTS在ADTS音頻的格式中會詳細解讀,這是解碼器通用的格式,就是AAC的純ES流要打包成ADTS格式的AAC文件,解碼器才能正常播放.就是在打包ADST的時候,需要samplingFrequencyIndex這個信息,samplingFrequencyIndex最準確的信息是在AudioSpecificConfig中,所以就對AudioSpecificConfig進行解析並得到了samplingFrequencyIndex。

到這步你就完全可以把FLV 文件中的音頻信息及數據提取出來,送給音頻解碼器正常播放了。

Video Tags

如果TAG包中的TagType==9時,就表示這個TAG是video.

StreamID之後的數據就表示是VideoTagHeaderVideoTagHeader結構如下:

Field Type Comment
Frame Type UB [4] Type of video frame. The following values are defined:
1 = key frame (for AVC, a seekable frame)
2 = inter frame (for AVC, a non-seekable frame)
3 = disposable inter frame (H.263 only)
4 = generated key frame (reserved for server use only)
5 = video info/command frame
CodecID UB [4] Codec Identifier. The following values are defined:
2 = Sorenson H.263
3 = Screen video
4 = On2 VP6
5 = On2 VP6 with alpha channel
6 = Screen video version 2
7 = AVC
AVCPacketType IF CodecID == 7
UI8

The following values are defined:
0 = AVC sequence header
1 = AVC NALU
2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)

CompositionTime IF CodecID == 7
SI24
IF AVCPacketType == 1
Composition time offset
ELSE
0
See ISO 14496-12, 8.15.3 for an explanation of composition
times. The offset in an FLV file is always in milliseconds.

VideoTagHeader的頭1個字節,也就是接跟着StreamID的1個字節包含着視頻幀類型及視頻CodecID最基本信息.表裏列的十分清楚.

VideoTagHeader之後跟着的就是VIDEODATA數據了,也就是video payload.當然就像音頻AAC一樣,這裏也有特例就是如果視頻的格式是AVC(H.264)的話,VideoTagHeader會多出4個字節的信息.

AVCPacketType 和 CompositionTime。AVCPacketType 表示接下來 VIDEODATA (AVCVIDEOPACKET)的內容:

IF AVCPacketType == 0 AVCDecoderConfigurationRecord(AVC sequence header)
IF AVCPacketType == 1 One or more NALUs (Full frames are required)

AVCDecoderConfigurationRecord.包含着是H.264解碼相關比較重要的sps和pps信息,再給AVC解碼器送數據 流之前一定要把sps和pps信息送出,否則的話解碼器不能正常解碼。而且在解碼器stop之後再次start之前,如seek、快進快退狀態切換等,都 需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情況也是出現1次,也就是第一個 video tag.

AVCDecoderConfigurationRecord的定義在ISO 14496-15, 5.2.4.1中,這裏不在詳細貼,

SCRIPTDATA

如果TAG包中的TagType==18時,就表示這個TAG是SCRIPT.

SCRIPTDATA 結構十分複雜,定義了很多格式類型,每個類型對應一種結構.

Field Type Comment
Type UI8 Type of the ScriptDataValue.
The following types are defined:
0 = Number
1 = Boolean
2 = String
3 = Object
4 = MovieClip (reserved, not supported)
5 = Null
6 = Undefined
7 = Reference
8 = ECMA array
9 = Object end marker
10 = Strict array
11 = Date
12 = Long string
ScriptDataValue IF Type == 0
DOUBLE
IF Type == 1
UI8
IF Type == 2
SCRIPTDATASTRING
IF Type == 3
SCRIPTDATAOBJECT
IF Type == 7
UI16
IF Type == 8
SCRIPTDATAECMAARRAY
IF Type == 10
SCRIPTDATASTRICTARRAY
IF Type == 11
SCRIPTDATADATE
IF Type == 12
SCRIPTDATALONGSTRING
Script data value.
The Boolean value is (ScriptDataValue ≠ 0).

類型在FLV的官方文檔中都有詳細介紹.

onMetaData

onMetaData 是SCRIPTDATA中對我們來說十分重要的信息,結構如下表:

Property Name Type Comment
audiocodecid Number Audio codec ID used in the file (see E.4.2.1 for available SoundFormat values)
audiodatarate Number Audio bit rate in kilobits per second
audiodelay Number Delay introduced by the audio codec in seconds
audiosamplerate Number Frequency at which the audio stream is replayed
audiosamplesize Number Resolution of a single audio sample
canSeekToEnd Boolean Indicating the last video frame is a key frame
creationdate String Creation date and time
duration Number Total duration of the file in seconds
filesize Number Total size of the file in bytes
framerate Number Number of frames per second
height Number Height of the video in pixels
stereo Boolean Indicating stereo audio
videocodecid Number Video codec ID used in the file (see E.4.3.1 for available CodecID values)
videodatarate Number Video bit rate in kilobits per second
width Number Width of the video in pixels

這裏面的duration、filesize、視頻的width、height等這些信息對我們來說很有用.

keyframes

當時在做flv demux的時候,發現官方的文檔中並沒有對keyframes index做描述,但是flv的這種結構每個tag又不像TS有同步頭,如果沒有keyframes index 的話,seek及快進快退的效果會非常差,因爲需要一個tag一個tag的順序讀取。後來通過網絡查一些資料,發現了一個keyframes的信息藏在SCRIPTDATA中。

keyframes幾乎是一個非官方的標準,也就是民間標準.在網上已經很 難看到flv文件格式,但是metadata裏面不包含 keyframes項目的視頻 . 兩個常用的操作metadata的工具是flvtool2和FLVMDI,都是把keyframes作爲一個默認的元信息項目.在FLVMDI的主頁 (http://www.buraks.com/flvmdi/)上有描述:

keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

也就是說keyframes中包含着2個內容 'filepositions' and 'times'分別指的是關鍵幀的文件位置和關鍵幀的PTS.通過keyframes可以建立起自己的Index,然後再seek和快進快退的操作中,快速有效的跳轉到你想要找的關鍵幀的位置進行處理。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章