H.264 extradata (partially) explained

H.264 extradata (partially) explained - for dummies

While this article will seem obvious and redundant to anyone who is fluent in H.264, i'm hoping it will come in useful for those people who stumble upon this issue.

I'm not going to go into any details about H.264 internals, parameters or anything like that. instead, in this short article i'm going to treat H.264 as a big opaque black box which has to be fed an annoying piece of information known as "extradata".

what's so annoying about extradata?
  1. it comes in two different flavors
  2. you need rudimentary knowledge of H.264 bitstream in order to retrieve it
  3. you need rudimentary knowledge of H.264 bitstream in order to know which flavor you need
but first, we need to learn about two different flavors of H.264 bitstram.

Annex B format

in this format, each NAL is preceeded by a four byte start code: 0x00 0x00 0x00 0x01
thus in order to know where a NAL start and where it stops, you would need to read each byte of the bitstream, looking for these start codes, which can be a pain if you need to convert between this format and the other format.

AVCC format

This "non Annex B" format is known as AVCC format. in this format, each NAL is precedded by a nal_size field. the size of the field in bytes is in many cases 4, but it is not assumed to be 4, and in fact this is part of the reason why a decoder needs any "extra data", in the first place.

So, Why does a decoder need extradata anyway?
  1. it needs to know what flavor of the bitstream to expect
  2. if AVCC format is used, it needs to know what the is the size of the nal_size field, in bytes.
  3. if the parameters for decoding are not repeated every keyframe, but rather specified only once (such as in a file), it needs those parameters (the SPS & PPS in H.264 speak)
How do I get this extradata?

when reading from a file, the extradata is usually part of the headers of the file, and you (or the demuxer) need to extract it from there.

if the extradata is repeated with every key frame, you can try to extract it from the bitstream itself, most of the time it will bundled in the same buffer or packet as the keyframe itself, and preceeding it.

if the bitstream is in annex-b format, you're in luck! you don't really need the extradata, because the codec can figure it out itself from the bitstream, at most, you will need to tell the decoder to treat the bitstream as annex-b, which is often achieved by NOT supplying any extradata to begin with.

on the other hand, if the bitstream is in avcc format, you desperately need this extradata, without it the decoder doesn't know how long the nal_size field is, and thus cannot even parse the bitstream.

suppose I have the SPS and PPS information, how do I create the extradata?

again, for annex-b format, you just use the following pseudo code:

write(0x00)
write(0x00)
write(0x00)
write(0x01)
for each byte b in SPS
  write(b)

for each PPS p in PPS_array
  write(0x00)
  write(0x00)
  write(0x00)
  write(0x01)
  for each byte b in p
    write(b)

On the other hand, AVCC format extradata is more complicated:

write(0x1);  // version
write(sps[0].data[1]); // profile
write(sps[0].data[2]); // compatibility
write(sps[0].data[3]); // level
write(0xFC | 3); // reserved (6 bits), NULA length size - 1 (2 bits)
write(0xE0 | 1); // reserved (3 bits), num of SPS (5 bits)
write_word(sps[0].size); // 2 bytes for length of SPS
for(size_t i=0 ; i < sps[0].size ; ++i)
  write(sps[0].data[i]); // data of SPS

write(&b, pps.size());  // num of PPS
for(size_t i=0 ; i < pps.size() ; ++i) {
  write_word(pps[i].size);  // 2 bytes for length of PPS
  for(size_t j=0 ; j < pps[i].size ; ++j)
    write(pps[i].data[j]);  // data of PPS
}


notice how the first byte of the avcc extradata is 1, which makes it obvious it is not a start of an annex-b extradata (which must begin with 0x00)

Notes about .mov files and Quicktime

internally, (at least with version 7.0) quicktime codecs work only with avcc formats and not with annex b format. that means that if you are unlucky enough to have H.264 in annex b format and need to decode it with quicktime codecs (for instance on an iphone) you would need to:
  1. convert the annex b 0x00 0x00 0x00 0x01 start codes into 4-byte long avcc nal_size fields.
    this requires a loop through the entire buffer, searching for these start codes
  2. you would need to extract the SPS and PPS NALs, and create an extradata buffer from them in the special format outlined above.
additionally, since .mov container is basically a quicktime container, it is natural that H.264 is stored on .mov files in AVCC format, and thus .mov muxers will need to know how to convert annex-b formatted H.264 buffers intead AVCC formatted H.264 buffers, and also how to convert the extradata buffer into one usable with AVCC format.




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章