webm EBML文件頭分析 webm EBML文件頭分析

webm EBML文件頭分析

EBML element

分析webm格式,主要是對ebml element的理解,EBML是類似於XML那樣的層次化結構,每一個元素都有它的ID、值,在二進制存儲中每個元素的排列是ID,長度,值

Element IDs (also called EBML IDs) are outlined as follows, beginning with the ID itself, followed by the Data Size, and then the non-interpreted Binary itself:

    1. Element ID coded with an UTF-8 like system :
    2. bits, big-endian
    3. 1xxx xxxx - Class A IDs (2^7 -1 possible values) (base 0x8X)
    4. 01xx xxxx xxxx xxxx - Class B IDs (2^14-1 possible values) (base 0x4X 0xXX)
    5. 001x xxxx xxxx xxxx xxxx xxxx - Class C IDs (2^21-1 possible values) (base 0x2X 0xXX 0xXX)
    6. 0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx - Class D IDs (2^28-1 possible values) (base 0x1X 0xXX 0xXX 0xXX)

Data size, in octets, is also coded with an UTF-8 like system :

  1. bits, big-endian
  2. 1xxx xxxx - value 0 to 2^7-2
  3. 01xx xxxx xxxx xxxx - value 0 to 2^14-2
  4. 001x xxxx xxxx xxxx xxxx xxxx - value 0 to 2^21-2
  5. 0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx - value 0 to 2^28-2
  6. 0000 1xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx - value 0 to 2^35-2
  7. 0000 01xx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx - value 0 to 2^42-2
  8. 0000 001x xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx - value 0 to 2^49-2
  9. 0000 0001 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx - value 0 to 2^56-2

去掉001這樣的前綴,xxx就是實際需要的element id, element data size的值。

Data

element id, element data size 都是以001這樣開頭,element data直接跟在element data size,沒有了前面的001這樣的前綴。

android 中的 EBML header的ID( external/libvpx/libmkv/EbmlIDs.h):

EBML = 0x1A45DFA3,          
EBMLVersion = 0x4286,       
EBMLReadVersion = 0x42F7,   
EBMLMaxIDLength = 0x42F2,   
EBMLMaxSizeLength = 0x42F3, 
DocType = 0x4282,           
DocTypeVersion = 0x4287,    
DocTypeReadVersion = 0x4285,

//segment            
Segment = 0x18538067,


所以判斷一個文件是否webm文件需要滿足兩個主要條件:

  • 是否有EBML文件頭0x1A45DFA3
  • doctype是不是webm

分析webm文件頭

這是通過ghex程序拷貝的一個文件的ebml文件頭信息(ghex打開文件後,可以通過save as菜單把hex保存爲html):

1a 45 df a3 01 00 00 00 00 00 00 1f 42 86 81 01 42 f7
81 01 42 f2 81 04 42 f3 81 08 42 82 84 77 65 62 6d 42
87 81 02 42 85 81 02 18 53 80 67 01 00 00 00 00 18 ab

這個文件的EBML header可以這樣理解:

Element ID:1a 45 df a3
Element data size : 01    [0000 0001, 8個字節]
Element data: 00 00 00 00 00 00 1f     [十進制是31,表示了後面所有Element總長度(字節),所以對於EBML header 的level 0,data的內容就是header中sub element的總字節數]


以42 82爲例分析doctype:

Element ID:42 82 
Element data size : 84   [84二進制就是1000 0100,去掉1,後面就是000 0100,十進制是4,表示後面的數據佔四個字節]
Element data: 77 65 62 6d   [對應的ascii字符就是w e b m]


gstreamer中gsttypefindfunctions.c 中 EBML 文件頭解析的部分代碼如下:

  1. /* EBML typefind helper */
  2. static gboolean
  3. ebml_check_header (GstTypeFind * tf, const gchar * doctype, int doctype_len)
  4. {
  5. /* 4 bytes for EBML ID, 1 byte for header length identifier */
  6. guint8 *data = gst_type_find_peek (tf, 0, 4 + 1);
  7. gint len_mask = 0x80, size = 1, n = 1, total;
  8. if (!data)
  9. return FALSE;
  10. /* ebml header? */
  11. if (data[0] != 0x1A || data[1] != 0x45 || data[2] != 0xDF || data[3] != 0xA3)
  12. return FALSE;
  13. /* length of header */
  14. total = data[4];
  15. /*
  16. * len_mask binary: 1000 0000, while循環 total & len_mask 就可計算出前面0的個數,
  17. * 碰到1結束循環,size的值剛好就是ebml head element的字節數。
  18. */
  19. while (size <= 8 && !(total & len_mask)) {
  20. size++;
  21. len_mask >>= 1;
  22. }
  23. if (size > 8) /* 得出ebml header(level 0) data 的字節數 */
  24. return FALSE;
  25. total &= (len_mask - 1);
  26. while (n < size)
  27. total = (total << 8) | data[4 + n++];
  28. /* get new data for full header, 4 bytes for EBML ID,
  29. * EBML length tag and the actual header */
  30. data = gst_type_find_peek (tf, 0, 4 + size + total);
  31. if (!data)
  32. return FALSE;
  33. /* only check doctype if asked to do so */
  34. if (doctype == NULL || doctype_len == 0)
  35. return TRUE;
  36. /* the header must contain the doctype. For now, we don't parse the
  37. * whole header but simply check for the availability of that array
  38. * of characters inside the header. Not fully fool-proof, but good
  39. * enough. */
  40. for (n = 4 + size; n <= 4 + size + total - doctype_len; n++)
  41. if (!memcmp (&data[n], doctype, doctype_len))
  42. return TRUE;
  43. return FALSE;
  44. }

調用ebml_check_header的時候指定參數doctype爲"matroska", "webm"即可。

  1. static void
  2. matroska_type_find (GstTypeFind * tf, gpointer ununsed)
  3. {
  4. if (ebml_check_header (tf, "matroska", 8))
  5. gst_type_find_suggest (tf, GST_TYPE_FIND_MAXIMUM, MATROSKA_CAPS);
  6. else if (ebml_check_header (tf, NULL, 0))
  7. gst_type_find_suggest (tf, GST_TYPE_FIND_LIKELY, MATROSKA_CAPS);
  8. }


參考:

多媒體封裝格式詳解---MKV【1】【2】【3】

http://blog.csdn.net/tx3344/article/details/8162656
http://blog.csdn.net/tx3344/article/details/8176288
http://blog.csdn.net/tx3344/article/details/8203260

MKV的EBML格式
http://tigersoldier.is-programmer.com/2008/6/30/ebml-in-mkv.4052.html


MKV文件格式
http://blog.chinaunix.net/uid-12845622-id-311943.html


Matroska文件解析之SimpleBlock
http://www.cnblogs.com/tangdoudou/archive/2012/05/14/2499063.html


工具MKVtoolnix:
http://www.cinker.com/2009/01/13/mkv-movie-split-merge-mkvtoolnix/

mkv ebml官方文檔:
http://www.matroska.org/technical/specs/index.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章