OGG音頻格式分析

一. OGG音頻格式概述

Ogg是一個自由且開放標準的容器格式,由Xiph.Org基金會所維護。Ogg格式並不受到軟件專利的限制,並設計用於有效率地流媒體和處理高品質的數字多媒體。

Ogg”意指一種文件格式,可以納入各式各樣自由和開放源代碼的編解碼器,包含音效、視頻、文字(像字幕)與元數據的處理。

Ogg的多媒體框架下,Theora提供有損的圖像層面,而通常用音樂導向的Vorbis編解碼器作爲音效層面。針對語音設計的壓縮編解碼器Speex和無損的音效壓縮編解碼器FLACOggPCM也可能作爲音效層面使用。

Ogg”這個詞彙通常意指Ogg Vorbis此一音頻文件格式,也就是將Vorbis編碼的音效包含在Ogg的容器中所成的格式。在以往,.ogg此一擴展名曾經被用在任何Ogg支持格式下的內容;但在2007年,Xiph.Org基金會爲了向後兼容的考量,提出請求,將.ogg只留給Vorbis格式來使用。Xiph.Org基金會決定創造一些新的擴展名和媒體格式來描述不同類型的內容,像是隻包含音效所用的.oga、包含或不含聲音的影片(涵蓋Theora)所用的.ogv和程序所用的.ogx

OGGVobis(oggVorbis)是一種新的音頻壓縮格式,類似於MP3等的音樂格式。OggVobis是完全免費、開放和沒有專利限制的。OggVorbis文件的擴展名是.OGGOgg文件格式可以不斷地進行大小和音質的改良,而不影響舊有的編碼器或播放器。OGG Vorbis有一個特點是支持多聲道。

二. OGG音頻格式剖析

1.        OGG文件的組織形式

OGG是以頁(page)爲單位將邏輯流組織鏈接起來,每個頁都有pageheaderpagedata。如下圖1所示:

A*

B*

C*

..

A#

 

B#

C#

D*

 

 

D#

Bos   bos   bos              eos             eos    eos bos             eos

1 OGG文件的組織形式

上圖中的文件鏈接了兩個物理流,ABC三個邏輯流組成一個物理流,邏輯流D單獨是一個物理流。一個物理流中的所有邏輯流的bos_page都必須在物理位置上相鄰,如圖1所示*A**B**C*三個bos_page的位置。   

bosbeginning of stream;   

eosend of stream

   

2.        OGG page頁結構

每個頁之間相互獨立,都包含了各自應有的信息,頁的大小是可變的,通常爲4K8KB,最大值不能超過65307bytes27255255*255=65307)。頁頭部格式如圖2

 

  0                  8                  16                 24               31

OggS

V

Header_type

Granule_position

 

 

Serial_number

 

Page_sequence

 

CRC_checksum

 

Num_segment

Segment_table

…………………………

…………………………

…………

payload

…………………………

2 OGG頁頭部結構

1)       頁標識:ASCII字符,0x4f  'O'  0x67  'g'   0x67 'g'  0x53 'S'4個字節大小,它標識着一個頁的開始。其作用是分離Ogg封裝格式還原媒體編碼時識別新頁的作用。

2)       版本id:一般當前版本默認爲01個字節。

3)       Header_type:標識當前的頁的類型,1個字節,

0x01:本頁媒體編碼數據與前一頁屬於同一個邏輯流的同一個packet,若此位沒有設,表示本頁是以一個新的packet開始的;

0x02:表示該頁爲邏輯流的第一頁,bos標識,如果此位未設置,那表示不是第一頁;

0x04:表示該頁位邏輯流的最後一頁,eos標識,如果此位未設置,那表示本頁不是最後一頁。

4)       Granule_position:媒體編碼相關的參數信息,8個字節,對於音頻流來說,它存儲着到本頁爲止邏輯流在PCM輸出中採樣碼的數目,可以由它來算得時間戳。對於視頻流來說,它存儲着到本頁爲止視頻幀編碼的數目。若此值爲-1,那表示截止到本頁,邏輯流的packet未結束。(小端)

5)       Serial_number:當前頁中的流的id4個字節,它是區分本頁所屬邏輯流與其他邏輯流的序號,我們可以通過這個值來劃分流。(小端)

6)       Page_seguence:本頁在邏輯流的序號,4個字節。OGG解碼器能據此識別有無頁丟失。

7)       CRC_cbecksum:循環冗餘校驗碼校驗和,4個字節,包含頁的32bit CRC校驗和(包括頭部零CRC校驗和頁數據校驗),它的產生多項式爲:0x04c11db7

8)       Num _segments:給定本頁在segment_table域中出現的segement個數,1個字節。其最大值爲255.頁最大物理尺寸爲65307bytes,小於64KB

9)       Segment_table:從字面看它就是一個表,表示着每個segment的長度,取值範圍是0~255

segment可以得到packet的值,每個packet的大小是以最後一個不等於255segment結束的,從頁頭中的segment_table可以得到每個packet長度,舉例:如果一組segment依次順序爲FF 45 FF FF FF 40 FF 5 FF FF FF66,那麼第一個packet的長度爲255+69 = 324,第二個packet大小829,同理。

頁頭基本上就是由上述的參數組成,由此我們可以得到頁頭的長度和整個頁的長度:

header_size  = 27+Num_segments;byte

page_size = header_size +segment_table中每個segment的大小;

 

3.        OGG封裝處理過程(附)

1)       音視頻編碼在提供給Ogg封裝之前是以具有包邊界的“Packets”形式呈現的,包邊界依賴於具體的編碼格式。如圖3所示。   

2)       將邏輯流的各個包進行分片segmentation,每片大小固定爲255Byte,但包的最後一個segment通常小於255字節。因爲packet的大小可以是任意長度,由具體的媒體編碼器來決定。   

3)       進行頁封裝,每頁都被加上頁頭,每頁的長度可不等,由具體情況而確定。頁頭部segment_table域告知了lacing_value”值的大小,即頁中最後一個segment的長度(可以爲0,或小於255)。一次處理一個packet,此packet被封裝成一個或多個page頁(page的長度設定了上限,一般爲4kB);下一個packet必須用新的page開始封裝,由首部字段域header_type_flag的設置規定來表示。   

多個已被頁格式封裝好的邏輯流(如語音、文本、圖片、音頻、視頻等)按應用要求的時序關係合成物理流。

Logical bitstream with packet boundaries
 -----------------------------------------------------------------
 > |      packet_1            | packet_2         | packet_3 | <
 -----------------------------------------------------------------

                                        |segmentation(logically only)
                    v

packet_1 (5segments)          packet_2 (4segs)    p_3 (2 segs)
     ------------------------------ --------------------------------
 ..  |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 |..
     ------------------------------ --------------------------------

                                | page encapsulation
                    v

page_1 (packet_1 data)   page_2 (pket_1data)   page_3 (packet_2 data)
------------------------  ----------------  ------------------------
|H|------------------- |  |H|----------- |  |H|------------------- |
|D||seg_1|seg_2|seg_3| |  |D|seg_4|s_5 | |  |D||seg_1|seg_2|seg_3| | …
|R|------------------- |  |R|----------- |  |R|------------------- |
------------------------  ----------------  ------------------------

|
pages of            |
other    --------|  |
logical         -------
bitstreams      | MUX |
               -------
                  |
                  v

page_1 page_2          page_3
      ------  ------  ------- -----  -------
 …  ||   |  ||   | ||    |  ||  |  ||    |  …
      ------  ------  ------- -----  -------
             physical Ogg bitstream

3 OGG封裝流程示意圖

4.        OGG Vorbis比特流結構

Vorbis比特流是以三個數據包頭開始的。這些頭數據包按順序依次是:The identification headerThe comment header和設置數據包。這些都與解碼Vorbis音頻文件密切相關的。

1)       數據包頭結構

每個數據包都是以同樣的頭結構開始的:

u [packet_type] : 8 bit value

u 0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters'v','o','r','b','i','s' as six octets

2)       The identification header

The identificationheader identifies the bitstream as Vorbis, Vorbis

version, and the simpleaudio characteristics of the stream such as sample rate and number of channels.

u [vorbis_version] = read 32 bits as unsigned integer

u [audio_channels] = read 8 bit integer as unsigned必須大於0

u [audio_sample_rate] = read 32 bits as unsigned integer必須大於0

u [bitrate_maximum] = read 32 bits as signed integer

u [bitrate_nominal] = read 32 bits as signed integer

u [bitrate_minimum] = read 32 bits as signed integer

u [blocksize_0] = 2 exponent (read 4 bits as unsigned integer)必須小於等於[blocksize_1]

u [blocksize_1] = 2 exponent (read 4 bits as unsigned integer)

u [framing_flag] = read one bit不能爲0

 

Thebitrate fields above are used only as hints. The nominal bitrate fieldespecially may be considerably of in purely VBR streams. The fields aremeaningful only when greater than zero.

a)        All three fields set to thesame value implies a fixed rate, or tightly bounded, nearly fixed-ratebitstream

b)       Only nominal set implies a VBRor ABR stream that averages the nominal bitrate

c)        Maximum and or minimum setimplies a VBR bitstream that obeys the bitrate limits

d)       None set indicates the encoderdoes not care to speculate.

3)       The comment header

Thecomment header includes user text comments (\tags") and a vendor stringfor the application/library that produced the bitstream.

Thecomment header is logically a list of eight-bit-clean vectors; the number ofvectors is bounded to 232 .. 1 and the length of each vector is limited to 232.. 1 bytes. The vector length is encoded; the vector contents themselves arenot null terminated. In addition to the vector list, there is a single vectorfor vendor name (also 8 bit clean, length encoded in 32 bits). For example, the1.0 release of libvorbis set the vendor string to \Xiph.Org libVorbis I20020717".

The vector lengths and number of vectors are stored lsbfirst, according to the bit packing conventions of the vorbis codec. However,since data in the comment header is octetaligned,they can simply be read asunaligned 32 bit little endian unsigned integers

 

 The comment vectors are structured similarlyto a UNIX environment variable. That is,comment fields consist of a field nameand a corresponding value and look like:

1 comment[0]="ARTIST=me";

2comment[1]="TITLE=the sound of Vorbis";

The fieldname is case-insensitive and may consist of ASCII 0x20 through 0x7D, 0x3D ('=')excluded. ASCII 0x41 through 0x5A inclusive (characters A-Z) is to beconsidered equivalent to ASCII 0x61 through 0x7A inclusive (characters a-z).Thefield name is immediately followed by ASCII 0x3D ('=');

thisequals sign is used to terminate the field name.0x3D is followed by 8 bit cleanUTF-8 encoded value of the field contents to the end of the field.Field namesBelow is a proposed, minimal list of standard field names with a description ofintended use. No single or group of field names is mandatory; a comment headermay contain one, all or none of the names in this list.

 

u TITLE Track/Work name

u VERSION The version field may be used to differentiate multipleversions of the same track title in a single collection. (e.g. remix info)

u ALBUM The collection name to which this track belongs

u TRACKNUMBER The track number of this piece if part of a specific largercollection or album

u ARTIST The artist generally considered responsible for the work. Inpopular music this is usually the performing band or singer. For classicalmusic it would be the composer.For an audio book it would be the author of theoriginal text.

u PERFORMER The artist(s) who performed the work. In classical musicthis would be the conductor, orchestra, soloists. In an audio book it would bethe actor who did the reading. In popular music this is typically the same asthe ARTIST and is omitted.

u COPYRIGHT Copyright attribution.

u LICENSE License information, eg, 'All Rights Reserved', 'Any UsePermitted'.

u ORGANIZATION Name of the organization producing the track (i.e. the'record label')

u DESCRIPTION A short text description of the contents

u GENRE A short text indication of music genre

u DATE Date the track was recorded

u LOCATION Location where track was recorded

u CONTACT Contact information for the creators or distributors of thetrack. This could be a URL, an email address, the physical address of the producinglabel.

u ISRC International Standard Recording Code for the track; see theISRC intro page for more information on ISRC numbers.

 

Hint: Field names are not required to beunique (occur once) within a comment header. As

an example, assume a track was recorded bythree well know artists; the following is

permissible, and encouraged:

1 ARTIST=Dizzy Gillespie

2 ARTIST=Sonny Rollins

3 ARTIST=Sonny Stitt

4)       Setup Header

The setupheader includes extensive CODEC setup information as well as the complete VQand Hu man codebooks needed for decode.

Thesetup header contains, in order, the lists of codebook configurations,time-domain transform configurations (placeholders in Vorbis I), floorconfigurations, residue configurations,channel mapping configurations and modeconfigurations. It finishes with a framing bit of '1'. 如下圖:


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章