Tutorial 05: Synching Video

看了很多的例子。本章需要學習的是視頻同步,有很多新知識需要學習。我就先把代碼翻譯一下。


CAVEAT

第一次寫這個入門教程時,所有的同步代碼都來至於ffplay.c。然而今天,這是一個完全改版的程序,因爲ffmpeg庫(包括ffplay.c)都在策略上有所改變。雖然當前的代碼可以工作,但是並不好,而且這個教程裏的代碼還有很大的提升空間。


視頻同步的原理(How Video Syncs)

直到現在,我們已經做一個具備基本功能,但其實是沒有什麼用處的視頻播放器。它可以播放視頻,可以播放音頻,但卻不能稱之爲電影,我們該怎麼辦呢?


顯示時間戳和解碼時間戳(PTS and DTS)

幸運的是,音頻流和視頻有何時播放的信息。音頻流有采樣頻率而視頻流有每秒可播放的幀數。然而,如果我們只是簡單的使用幀數乘以幀率來同步視頻的話,那麼很可能就超出音頻的播放速度了。然而,從視頻流裏面解碼出來的packet很可能會有DTS和PTS。要理解這2個值的含義,你需要知道視頻數如何存儲的。有些格式, 比如MPEG,使用了B幀(B表示雙向預測幀). I 幀代表一個完整的圖像。P幀依賴前面的I幀,P幀可能只是前面I幀的一個變換或者其他什麼東西。B幀和P是有一樣的地方,但是準確的預測B幀需要考慮到其前後雙向的幀。這就是爲什麼調用avcode_decode_video2沒有得到一個完整的幀的原因。

假設有個視頻,它的幀呈現的格式爲:I B B P. 這樣當我們顯示B幀的時候,就需要知道P幀的內容。因爲這個原因,幀的存儲格式很可能爲: I P B B. 這就是爲什麼每個幀都有一個解碼時間戳DTS和顯示時間戳PTS。解碼時間戳告訴我們,什麼時候需要解碼幀,而顯示時間戳告訴我們什麼時候需要顯示某個幀。因此流很可能是如下的情況:

PTS: 1 4 2 3
DTS: 1 2 3 4
Stream: I P B B

通常PTS與DTS只有當流中含有B幀時纔會不一樣。當我們從av_read_frame()獲取一個包時,該包裏面將含有PTS和DTS這2個數據。但是我們真正想要知道的是新解碼的raw frame的PTS,這樣我們才能知道何時顯示它。

萬幸的是,ffmpeg提供了一個最有效的時間戳,你可以通過av_frame_get_best_effort_timestamp來獲取它。


同步(Synching)

現在,我們已經知道了什麼時候顯示一個特定的視頻幀,但是實際該如何操作呢?這裏有個方法:當我們播放一個幀時,計算出下一個幀顯示的時間。 然後設置一個定時器,當超時後我們發出一個刷新的事件來刷新視頻。正如你所想象的,我們檢查下一個視頻幀的PTS值和系統時鐘對比確定定時器的時間。這個方法可以工作,但是有2個問題需要處理。

1.  第一個問題就是下一個PTS該如何計算。你有可能會想我們可以在當前的PTS上面加上幀速率,但其實這是錯誤的。然而,某些種類的視頻要求幀重複。這意味着我們應該重複當前幀特定的次數。這可能導致程序過早的顯示下一個視頻幀。而我們需要做點解釋。


2. 第二個問題正如程序顯示的那樣,視頻和音頻會嗡嗡作響,而不是同步好了。如果一切都很好了,我們就沒必要擔心了。但是你的電腦並不完美,許多食品文件也不是很好。因此我們有3個選擇:將音頻同步到視頻;將視頻同步到音頻;或者將兩者都同步到外部時鐘(比如計算機的時鐘)。而目前,我們打算將視頻同步到音頻。


編碼:獲取幀的PTS(Coding it: getting the frame PTS)

So now we've got our PTS all set. 
目前爲止,我們已經可以正確設置PTS。
Now we've got to take care of the two synchronization problems we talked about above. 
現在我們必須討論上述的2個同步問題了。
We're going to define a function called synchronize_video that will update the PTS to be in sync with everything. 
我們定義一個synchronize_video函數去更新PTS。
This function will also finally deal with cases where we don't get a PTS value for our frame. 
這個函數對無法得到PTS的Packet也能正確處理。
At the same time we need to keep track of when the next frame is expected so we can set our refresh rate properly. 
同時,我們需要持續的追蹤下一幀顯示的時間,這樣我們可以設置刷新率。
We can accomplish this by using an internal video_clock value which keeps track of how much time has passed according to the video. 
可以使用一個內部時鐘video_clock去持續追蹤視頻已經播放的時間。
We add this value to our big struct.
我們把這個結構加到了大的Video_State裏面。
typedef struct VideoState {
  double          video_clock; // pts of last decoded frame / predicted pts of next decoded frame
Here's the synchronize_video function, which is pretty self-explanatory:
這裏是synchronize_video函數,自己看看, 不需要過多說明了。
double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

  double frame_delay;

  if(pts != 0) {
    /* if we have pts, set video clock to it */
    is->video_clock = pts;
  } else {
    /* if we aren't given a pts, set it to the clock */
    pts = is->video_clock;
  }
  /* update the video clock */
  frame_delay = av_q2d(is->video_st->codec->time_base);
  /* if we are repeating a frame, adjust clock accordingly */
  frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);
  is->video_clock += frame_delay;
  return pts;
}
You'll notice we account for repeated frames in this function, too.
你會注意到我們

Now let's get our proper PTS and queue up the frame using queue_picture, adding a new pts argument:

現在,讓我們獲取合適的PTS並且使用queue_picture將其加入到隊列中,並添加一個pts字段。
// Did we get a video frame?
    if(frameFinished) {
      pts = synchronize_video(is, pFrame, pts);
      if(queue_picture(is, pFrame, pts) < 0) {
	break;
      }
    }
The only thing that changes about queue_picture is that we save that pts value to the VideoPicture structure that we queue up. 
queue_picture唯一改變的地方就是我們把pts值放到了VideoPicture結構中了。
So we have to add a pts variable to the struct and add a line of code:
因此我們給VideoPicture加了一個結構:
typedef struct VideoPicture {
  ...
  double pts;
}
int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {
  ... stuff ...
  if(vp->bmp) {
    ... convert picture ...
    vp->pts = pts;
    ... alert queue ...
  }
So now we've got pictures lining up onto our picture queue with proper PTS values, so let's take a look at our video refreshing function. 
目前爲止,我們已經將picture使用合適的PTS值排到了隊列中,因此讓我們看下視頻刷新函數。
You may recall from last time that we just faked it and put a refresh of 80ms. 
你可以回想下,上一節我們只是用一個代替的刷新時間80ms。
Well, now we're going to find out how to actually figure it out.
很好,現在我們需要確定如何正確的計算下一幀的刷新時間。
Our strategy is going to be to predict the time of the next PTS by simply measuring the time between the previous pts and this one. 
方法就是預測下一幀的pts,通過測試前一幀的pts和當前幀的pts之差。
At the same time, we need to sync the video to the audio. 
同時需要將視頻同步到音頻;
We're going to make an audio clock
我們將要創建一個audio_clock;
an internal value that keeps track of what position the audio we're playing is at. 
一個內部的值用來持續追蹤音頻播放的位置;
It's like the digital readout on any mp3 player. 
就像MP3中的數字尺;
Since we're synching the video to the audio, the video thread uses this value to figure out if it's too far ahead or too far behind.
因此我們將視頻同步到音頻,視頻線程使用這個值去確定當前播放太快或者太慢;

We'll get to the implementation later; 
稍後我們會給出實現;
for now let's assume we have a get_audio_clock function that will give us the time on the audio clock. 
現在我們只是假設我們有一個get_audio_clock函數,該函數可以給我們音頻的時間;
Once we have that value, though, what do we do if the video and audio are out of sync? 
一旦我們得到那個值,然而,如果音視頻不同步我們該怎麼辦呢?
It would silly to simply try and leap to the correct packet through seeking or something. 
只是簡單的通過糾正包通過搜尋或其他什麼方式。
Instead, we're just going to adjust the value we've calculated for the next refresh: 
代替的,我們打算之前計算過的調整下次刷新的值:
if the PTS is too far behind the audio time, we double our calculated delay. 
如果PTS落後於音頻幀,我們把delay乘以以計算;
if the PTS is too far ahead of the audio time, we simply refresh as quickly as possible. 
如果PTS太快超過音頻幀的播放,我們就立即刷新;
Now that we have our adjusted refresh time, or delay, we're going to compare that with our computer's clock by keeping a running frame_timer
既然我們有了刷新的時間或延遲,我們打算將其和計算機的時鐘進行比較,該時鐘我們使用frame_timer來維護;
This frame timer will sum up all of our calculated delays while playing the movie. 
這個播放視頻的時候,frame_timer會計算所有的delay;
In other words, this frame_timer is what time it should be when we display the next frame. 
換句話說,frame_timer就是下一幀應該顯示的時間;
We simply add the new delay to the frame timer, compare it to the time on our computer's clock, and use that value to schedule the next refresh. 
我們只是簡單的將新的delay添加到frame_timer中,並將其和系統時鐘比較,並使用這個值去調度下一次刷新的時間;
This might be a bit confusing, so study the code carefully:
這可能有點難解,還是認真看代碼吧:
void video_refresh_timer(void *userdata) {

  VideoState *is = (VideoState *)userdata;
  VideoPicture *vp;
  double actual_delay, delay, sync_threshold, ref_clock, diff;
  
  if(is->video_st) {
    if(is->pictq_size == 0) {
      schedule_refresh(is, 1);
    } else {
      vp = &is->pictq[is->pictq_rindex];

      delay = vp->pts - is->frame_last_pts; /* the pts from last time */
      if(delay <= 0 || delay >= 1.0) {
	/* if incorrect delay, use previous one */
	delay = is->frame_last_delay;
      }
      /* save for next time */
      is->frame_last_delay = delay;
      is->frame_last_pts = vp->pts;

      /* update delay to sync to audio */
      ref_clock = get_audio_clock(is);
      diff = vp->pts - ref_clock;

      /* Skip or repeat the frame. Take delay into account
	 FFPlay still doesn't "know if this is the best guess." */
      sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;
      if(fabs(diff) < AV_NOSYNC_THRESHOLD) {
	if(diff <= -sync_threshold) {
	  delay = 0;
	} else if(diff >= sync_threshold) {
	  delay = 2 * delay;
	}
      }
      is->frame_timer += delay;
      /* computer the REAL delay */
      actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
      if(actual_delay < 0.010) {
	/* Really it should skip the picture instead */
	actual_delay = 0.010;
      }
      schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
      /* show the picture! */
      video_display(is);
      
      /* update queue for next picture! */
      if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {
	is->pictq_rindex = 0;
      }
      SDL_LockMutex(is->pictq_mutex);
      is->pictq_size--;
      SDL_CondSignal(is->pictq_cond);
      SDL_UnlockMutex(is->pictq_mutex);
    }
  } else {
    schedule_refresh(is, 100);
  }
}
There are a few checks we make: 
這裏有幾個需要檢查的地方:
first, we make sure that the delay between the PTS and the previous PTS make sense. 
首先,需要確保當前的PTS和前一個PTS有意義;
If it doesn't we just guess and use the last delay. 
如果沒有,需要根據上次的delay進行猜測;
Next, we make sure we have a synch threshold because things are never going to be perfectly in synch.
下一步,我們需要一個同步的閥值,因爲完美的同步是不存在的;
ffplay uses 0.01 for its value. 
ffplay將該值設爲0.01;
We also make sure that the synch threshold is never smaller than the gaps in between PTS values. 
也要確保同步的閥值不能小於兩個PTS之間的值;
Finally, we make the minimum refresh value 10 milliseconds*.
最後,我們設置最小的刷新閥值爲10微妙;

We added a bunch of variables to the big struct so don't forget to check the code. 
我們也添加了一段代碼到大結構裏面,不要忘了檢查;
Also, don't forget to initialize the frame timer and the initial previous frame delay in stream_component_open:
同時,不要忘了初始化frame_timer並檢查之前在stream_component_open裏的延遲;
is->frame_timer = (double)av_gettime() / 1000000.0;
is->frame_last_delay = 40e-3;


Synching: The Audio Clock

Now it's time for us to implement the audio clock. 
現在是實現音頻時鐘的時候了;
We can update the clock time in our audio_decode_frame function, which is where we decode the audio. 
我們可以在audio_decode_frame裏面更新時鐘,就是解碼音頻的地方;
Now, remember that we don't always process a new packet every time we call this function, so there are two places we have to update the clock at. 
但是要記住,並不是每次調用這個函數的時候都會處理一個新包的,因此有2個地方我們必須要更新時鐘;
The first place is where we get the new packet: we simply set the audio clock to the packet's PTS. 
第一個地方就是獲取新包的時候,我們只是簡單的把音頻實在設置爲包的PTS;
Then if a packet has multiple frames, we keep time the audio play by counting the number of samples and multiplying them by the given samples-per-second rate.
如果一個packet有多個frame,。。。。
So once we have the packet: 
    /* if update, update the audio clock w/pts */
    if(pkt->pts != AV_NOPTS_VALUE) {
      is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
    }
And once we are processing the packet:
      /* Keep audio_clock up-to-date */
      pts = is->audio_clock;
      *pts_ptr = pts;
      n = 2 * is->audio_st->codec->channels;
      is->audio_clock += (double)data_size /
	(double)(n * is->audio_st->codec->sample_rate);
A few fine details: the template of the function has changed to include pts_ptr, so make sure you change that. 
pts_ptr is a pointer we use to inform audio_callback the pts of the audio packet. 
我們使用pts_ptr通知audio_callback音頻包的pts
This will be used next time for synchronizing the audio with the video.
我們使用這個pts_ptr將視頻同步到

Now we can finally implement our get_audio_clock function. 
現在我們可以實現我們的get_audio_clock函數了;
It's not as simple as getting the is->audio_clock value, thought. 
並不是簡單的獲取is->audio_clock就可以了的;
Notice that we set the audio PTS every time we process it, but if you look at the audio_callback function, it takes time to move all the data from our audio packet into our output buffer. 
注意我們每次處理音頻的時候都會設置音頻的PTS,但是如果你看看audio_callback函數,
That means that the value in our audio clock could be too far ahead. 
So we have to check how much we have left to write. 
Here's the complete code:
double get_audio_clock(VideoState *is) {
  double pts;
  int hw_buf_size, bytes_per_sec, n;
  
  pts = is->audio_clock; /* maintained in the audio thread */
  hw_buf_size = is->audio_buf_size - is->audio_buf_index;
  bytes_per_sec = 0;
  n = is->audio_st->codec->channels * 2;
  if(is->audio_st) {
    bytes_per_sec = is->audio_st->codec->sample_rate * n;
  }
  if(bytes_per_sec) {
    pts -= (double)hw_buf_size / bytes_per_sec;
  }
  return pts;
}


完整的代碼:

#include "stdafx.h"

#ifdef TUTORIAL_05
// tutorial05.c
// A pedagogical video player that really works!
//
// This tutorial was written by Stephen Dranger ([email protected]).
//
// Code based on FFplay, Copyright (c) 2003 Fabrice Bellard, 
// and a tutorial by Martin Bohme ([email protected])
// Tested on Gentoo, CVS version 5/01/07 compiled with GCC 4.1.1
//
// Use the Makefile to build all the samples.
//
// Run using
// tutorial05 myvideofile.mpg
//
// to play the video.


extern "C"
{
#include "libavutil/avstring.h"
#include "libavutil/mathematics.h"
#include "libavutil/pixdesc.h"
#include "libavutil/imgutils.h"
#include "libavutil/dict.h"
#include "libavutil/parseutils.h"
#include "libavutil/samplefmt.h"
#include "libavutil/avassert.h"
#include "libavutil/time.h"
#include "libavformat/avformat.h"
#include "libavdevice/avdevice.h"
#include "libswscale/swscale.h"
#include "libavutil/opt.h"
#include "libavcodec/avfft.h"
#include "libswresample/swresample.h"

#include "SDL1.2/SDL.h"
#include "SDL1.2/SDL_thread.h"
}

#pragma comment(lib, "avcodec.lib")
#pragma comment(lib, "avformat.lib")
#pragma comment(lib, "avutil.lib")
#pragma comment(lib, "avdevice.lib")
#pragma comment(lib, "avfilter.lib")
#pragma comment(lib, "postproc.lib")
#pragma comment(lib, "swresample.lib")
#pragma comment(lib, "swscale.lib")
#pragma comment(lib, "SDL.lib")

#ifdef __MINGW32__
#undef main /* Prevents SDL from overriding main() */
#endif

#include <stdio.h>
#include <math.h>

#define SDL_AUDIO_BUFFER_SIZE               1024
#define MAX_AUDIO_FRAME_SIZE                192000

#define MAX_AUDIOQ_SIZE                     (5 * 16 * 1024)
#define MAX_VIDEOQ_SIZE                     (5 * 256 * 1024)

#define AV_SYNC_THRESHOLD                   0.01
#define AV_NOSYNC_THRESHOLD                 10.0

#define FF_ALLOC_EVENT                      (SDL_USEREVENT)
#define FF_REFRESH_EVENT                    (SDL_USEREVENT + 1)
#define FF_QUIT_EVENT                       (SDL_USEREVENT + 2)

#define VIDEO_PICTURE_QUEUE_SIZE            1


// BD
int         g_iIndex_video_pkt = 0;
// ED

typedef struct PacketQueue {
    AVPacketList *first_pkt, *last_pkt;
    int nb_packets;
    int size;
    SDL_mutex *mutex;
    SDL_cond *cond;
} PacketQueue;


typedef struct VideoPicture {
    SDL_Overlay *bmp;
    int width, height; /* source height & width */
    int allocated;
    double pts;

    // BD
    AVPictureType type;
    int iIndex;
    // ED
} VideoPicture;

typedef struct VideoState {

    AVFormatContext *pFormatCtx;
    int             videoStream, audioStream;

    // audio
    double          audio_clock;
    AVStream        *audio_st;
    PacketQueue     audioq;
    AVFrame         audio_frame;
    uint8_t         audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];
    unsigned int    audio_buf_size;
    unsigned int    audio_buf_index;
    AVPacket        audio_pkt;
    uint8_t         *audio_pkt_data;
    int             audio_pkt_size;
    int             audio_hw_buf_size;
    double          frame_timer;
    double          frame_last_pts;
    double          frame_last_delay;

    // video
    double          video_clock; ///<pts of last decoded frame / predicted pts of next decoded frame
    AVStream        *video_st;
    PacketQueue     videoq;

    VideoPicture    pictq[VIDEO_PICTURE_QUEUE_SIZE];
    int             pictq_size, pictq_rindex, pictq_windex;
    SDL_mutex       *pictq_mutex;
    SDL_cond        *pictq_cond;

    SDL_Thread      *parse_tid;
    SDL_Thread      *video_tid;

    char            filename[1024];
    int             quit;

    AVIOContext     *io_context;
    struct SwsContext *sws_ctx;
} VideoState;

SDL_Surface     *screen;

/* Since we only have one decoding thread, the Big Struct
can be global in case we need it. */
VideoState *global_video_state;

struct SwrContext *swr_ctx;
DECLARE_ALIGNED(16, uint8_t, audio_buf2)[MAX_AUDIO_FRAME_SIZE * 4];

static inline double rint(double x)
{
    return x >= 0 ? floor(x + 0.5) : ceil(x - 0.5);
}

void packet_queue_init(PacketQueue *q) {
    memset(q, 0, sizeof(PacketQueue));
    q->mutex = SDL_CreateMutex();
    q->cond = SDL_CreateCond();
}

int packet_queue_put(PacketQueue *q, AVPacket *pkt) {
    AVPacketList *pkt1;
    if( av_dup_packet(pkt) < 0 ) {
        return -1;
    }
    pkt1 = (AVPacketList *)av_malloc(sizeof(AVPacketList));
    if( !pkt1 ) {
        return -1;
    }

    pkt1->pkt = *pkt;
    pkt1->next = NULL;

    SDL_LockMutex(q->mutex);

    if( !q->last_pkt ) {
        q->first_pkt = pkt1;
    } else {
        q->last_pkt->next = pkt1;
    }

    q->last_pkt = pkt1;
    q->nb_packets ++;
    q->size += pkt1->pkt.size;
    SDL_CondSignal(q->cond);

    SDL_UnlockMutex(q->mutex);
    return 0;
}
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)
{
    AVPacketList *pkt1;
    int ret;

    SDL_LockMutex(q->mutex);

    for( ; ; ) {
        if( global_video_state->quit ) {
            ret = -1;
            break;
        }

        pkt1 = q->first_pkt;
        if( pkt1 ) {
            q->first_pkt = pkt1->next;
            if( !q->first_pkt ) {
                q->last_pkt = NULL;
            }

            q->nb_packets --;
            q->size -= pkt1->pkt.size;
            *pkt = pkt1->pkt;
            av_free(pkt1);
            ret = 1;
            break;
        } else if( !block ) {
            ret = 0;
            break;
        } else {
            SDL_CondWait(q->cond, q->mutex);
        }
    }

    SDL_UnlockMutex(q->mutex);
    return ret;
}

double get_audio_clock(VideoState *is)
{
    double pts;
    int hw_buf_size, bytes_per_sec, n;

    // 當前音頻buffer播放完的時間
    pts = is->audio_clock; /* maintained in the audio thread */
    // 當前音頻buffer的剩餘時間
    hw_buf_size = is->audio_buf_size - is->audio_buf_index;
    bytes_per_sec = 0;

    // 計算音頻1秒鐘所需的數據量
    n = is->audio_st->codec->channels * 2;
    if( is->audio_st ) {
        bytes_per_sec = is->audio_st->codec->sample_rate * n;
    }

    // (double)hw_buf_size / bytes_per_sec;爲當前音頻播放完還需要的時間
    // pts減去上面的值得到當前的時間戳
    if( bytes_per_sec ) {
        pts -= (double)hw_buf_size / bytes_per_sec;
    }
    return pts;
}

int audio_decode_frame(VideoState *is, double *pts_ptr)
{
    int len1, data_size = 0, n;
    AVPacket *pkt = &is->audio_pkt;
    double pts;

    for( ; ; ) {
        while( is->audio_pkt_size > 0 ) {
            int got_frame;
            len1 = avcodec_decode_audio4(is->audio_st->codec, &is->audio_frame, &got_frame, pkt);
            if( len1 < 0 ) {
                /* if error, skip frame */
                is->audio_pkt_size = 0;
                break;
            }

            if( got_frame ) {
                AVCodecContext* aCodecCtx = is->audio_st->codec;

                uint64_t dec_channel_layout =
                    (aCodecCtx->channel_layout && aCodecCtx->channels == av_get_channel_layout_nb_channels(aCodecCtx->channel_layout)) ?
                    aCodecCtx->channel_layout : av_get_default_channel_layout(aCodecCtx->channels);

                AVSampleFormat tgtFmt = AV_SAMPLE_FMT_S16;
                if( aCodecCtx->sample_fmt != tgtFmt ) {
                    // 需要重採樣
                    if( swr_ctx == NULL ) {
                        swr_ctx = swr_alloc();
                        swr_ctx = swr_alloc_set_opts(swr_ctx,
                            dec_channel_layout, tgtFmt, aCodecCtx->sample_rate,
                            dec_channel_layout, aCodecCtx->sample_fmt, aCodecCtx->sample_rate, 0, NULL);

                        if( !swr_ctx || swr_init(swr_ctx) < 0 ) {
                            assert(false);
                        }
                    }

                    if( swr_ctx ) {
                        const uint8_t **in = (const uint8_t **)is->audio_frame.extended_data;
                        uint8_t *out[] = {audio_buf2};
                        int out_count = sizeof(audio_buf2) / aCodecCtx->channels / av_get_bytes_per_sample(aCodecCtx->sample_fmt);

                        int len2 = swr_convert(swr_ctx, out, out_count, in, is->audio_frame.nb_samples);
                        if( len2 < 0 ) {
                            LogPrintfA("swr_convert() failed\n");
                            break;
                        }
                        if( len2 == out_count ) {
                            LogPrintfA("warning: audio buffer is probably too small\n");
                            swr_init(swr_ctx);
                        }

                        data_size = len2 * aCodecCtx->channels * av_get_bytes_per_sample(tgtFmt);
                        memcpy(is->audio_buf, audio_buf2, data_size);
                    }
                } else {
                    // 不需要重採樣
                    data_size = av_samples_get_buffer_size(NULL,
                        aCodecCtx->channels,
                        is->audio_frame.nb_samples,
                        aCodecCtx->sample_fmt,
                        1);
                    assert(data_size <= is->audio_buf_size);
                    memcpy(is->audio_buf, is->audio_frame.data[0], data_size);
                }
            }
            is->audio_pkt_data += len1;
            is->audio_pkt_size -= len1;
            if( data_size <= 0 ) {
                /* No data yet, get more frames */
                continue;
            }

            pts = is->audio_clock;
            *pts_ptr = pts;
            // 2爲: 16位採樣, 一次佔用的字節數, 若非16位採樣, 就要修改字節數了
            // 這裏是爲了計算播放本次音頻buffer所需的時間
            n = 2 * is->audio_st->codec->channels;
            is->audio_clock += (double)data_size /
                (double)(n * is->audio_st->codec->sample_rate);
            
            //LogPrintf(_T("is->audio_clock: %f, plus: %f\n"), is->audio_clock, (double)data_size / (double)(n * is->audio_st->codec->sample_rate) );
            
            /* We have data, return it and come back for more later */
            return data_size;
        }
        if( pkt->data ) {
            av_free_packet(pkt);
        }

        if( is->quit ) {
            return -1;
        }
        /* next packet */
        if( packet_queue_get(&is->audioq, pkt, 1) < 0 ) {
            return -1;
        }

        is->audio_pkt_data = pkt->data;
        is->audio_pkt_size = pkt->size;
        /* if update, update the audio clock w/pts */
        if( pkt->pts != AV_NOPTS_VALUE ) {
            is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
        }
    }
}

void audio_callback(void *userdata, Uint8 *stream, int len)
{
    VideoState *is = (VideoState *)userdata;
    int len1, audio_size;
    double pts;

    while( len > 0 ) {
        if(is->audio_buf_index >= is->audio_buf_size) {
            /* We have already sent all our data; get more */
            audio_size = audio_decode_frame(is, &pts);
            if( audio_size < 0 ) {
                /* If error, output silence */
                is->audio_buf_size = 1024;
                memset(is->audio_buf, 0, is->audio_buf_size);
            } else {
                is->audio_buf_size = audio_size;
            }
            is->audio_buf_index = 0;
        }
        len1 = is->audio_buf_size - is->audio_buf_index;
        if( len1 > len ) {
            len1 = len;
        }
        memcpy(stream, (uint8_t *)is->audio_buf + is->audio_buf_index, len1);
        len -= len1;
        stream += len1;
        is->audio_buf_index += len1;
    }
}

static Uint32 sdl_refresh_timer_cb(Uint32 interval, void *opaque)
{
    SDL_Event event;
    event.type = FF_REFRESH_EVENT;
    event.user.data1 = opaque;
    SDL_PushEvent(&event);
    return 0; /* 0 means stop timer */
}

/* schedule a video refresh in 'delay' ms */
static void schedule_refresh(VideoState *is, int delay)
{
    SDL_AddTimer(delay, sdl_refresh_timer_cb, is);
}

void video_display(VideoState *is)
{
    SDL_Rect rect;
    VideoPicture *vp;
    //AVPicture pict;
    float aspect_ratio;
    int w, h, x, y;
    //int i;

    vp = &is->pictq[is->pictq_rindex];
    if( vp->bmp ) {
        if(is->video_st->codec->sample_aspect_ratio.num == 0) {
            aspect_ratio = 0;
        } else {
            aspect_ratio = av_q2d(is->video_st->codec->sample_aspect_ratio) *
                is->video_st->codec->width / is->video_st->codec->height;
        }
        if( aspect_ratio <= 0.0 ) {
            aspect_ratio = (float)is->video_st->codec->width /
                (float)is->video_st->codec->height;
        }
        h = screen->h;
        w = ((int)rint(h * aspect_ratio)) & -3;
        if( w > screen->w ) {
            w = screen->w;
            h = ((int)rint(w / aspect_ratio)) & -3;
        }
        x = (screen->w - w) / 2;
        y = (screen->h - h) / 2;

        rect.x = x;
        rect.y = y;
        rect.w = w;
        rect.h = h;
         
        // BD
        //LogPrintfA("---------------------------------------------------------- [%05d] refresh bmp, Packet:%d, type: %s, pts: %f\n",
        //    ::GetCurrentThreadId(), vp->iIndex, GetPictureTypeString(vp->type).c_str(), vp->pts);
        // ED

        SDL_DisplayYUVOverlay(vp->bmp, &rect);
    }
}

void video_refresh_timer(void *userdata)
{
    VideoState *is = (VideoState *)userdata;
    VideoPicture *vp;
    double actual_delay, delay, sync_threshold, ref_clock, diff;

    if( is->video_st ) {
        if( is->pictq_size == 0 ) {
            schedule_refresh(is, 1);
        } else {
            // 目標: 計算下一幀圖像的顯示時間
            vp = &is->pictq[is->pictq_rindex];
            
            // frame_last_pts存着上一幀圖像的pts, 用當前幀的pts減去上一幀的pts, 從而計算出一個估計的delay值
            // 該delay值是上一幀圖像已播放的時長
            delay = vp->pts - is->frame_last_pts; /* the pts from last time */

            // BD
            static int iIndex = 0;
            //LogPrintfA("上一幀播放時長爲: %f\n", delay);
            // ED
            // 這個delay值有一個範圍,如果超出範圍的話,則用再上一次的delay值
            if( delay <= 0 || delay >= 1.0 ) {
                /* if incorrect delay, use previous one */
                delay = is->frame_last_delay;
            }

            /* save for next time */
            is->frame_last_delay = delay;
            // 將當前幀的pts保存下來
            is->frame_last_pts = vp->pts;
            
            /* update delay to sync to audio */
            // ref_clock: audio播放的時間戳
            ref_clock = get_audio_clock(is);
            diff = vp->pts - ref_clock;
            
            // BD
            //LogPrintfA("vp->pts: %f, ref_clock: %f, diff: %f; delay: %f\n", vp->pts, ref_clock, diff, delay);
            // ED
            
            /* Skip or repeat the frame. Take delay into account
            FFPlay still doesn't "know if this is the best guess." */
            // delay和AV_SYNC_THRESHOLD之間取一個最大值
            // new
            sync_threshold = FFMAX(delay, AV_SYNC_THRESHOLD);
            // 時間正負在(-0.01, 0.01)範圍之外需要重新計算延遲
            if( fabs(diff) < AV_NOSYNC_THRESHOLD ) {
                if( diff <= -sync_threshold ) { // 如果diff是個很小的負數,則說明當前視頻幀已經落後於主時鐘源了,下一幀圖像應該快點顯示,所以delay=0
                    delay = 0;
                } else if( diff >= sync_threshold ) { // 如果diff是一個比較大的正數,則說明當前視頻幀已經超前於主時鐘源了,下一幀圖像應該延遲顯示
                    delay = 2 * delay;
                } else {
                    // diff是個可接受的數值, 可直接使用上一個delay
                    // LogPrintfA("abcd\n");
                }
            } else {
                assert(false);
            }

            // BD
            double frame_timer_old = is->frame_timer;
            // ED

            // frame_timer是一個delay累加的值, 加上delay後, frame_timer即爲下一幀圖像開始顯示的時間
            is->frame_timer += delay;
            /* computer the REAL delay */
            // frame_timer減去當前系統時鐘,得到一個actual_delay值
            actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
            if( actual_delay < 0.010 ) {
                /* Really it should skip the picture instead */
                actual_delay = 0.010;
            }
            schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
            
            /* show the picture! */
            video_display(is);

            /* update queue for next picture! */
            if( ++ is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE ) {
                is->pictq_rindex = 0;
            }
            SDL_LockMutex(is->pictq_mutex);
            is->pictq_size--;
            SDL_CondSignal(is->pictq_cond);
            SDL_UnlockMutex(is->pictq_mutex);
        }
    } else {
        schedule_refresh(is, 100);
    }
}

void alloc_picture(void *userdata) {
    VideoState *is = (VideoState *)userdata;
    VideoPicture *vp;

    vp = &is->pictq[is->pictq_windex];
    if( vp->bmp ) {
        // we already have one make another, bigger/smaller
        SDL_FreeYUVOverlay(vp->bmp);
    }

    // Allocate a place to put our YUV image on that screen
    vp->bmp = SDL_CreateYUVOverlay(is->video_st->codec->width,
        is->video_st->codec->height,
        SDL_YV12_OVERLAY,
        screen);
    vp->width = is->video_st->codec->width;
    vp->height = is->video_st->codec->height;

    SDL_LockMutex(is->pictq_mutex);
    vp->allocated = 1;
    SDL_CondSignal(is->pictq_cond);
    SDL_UnlockMutex(is->pictq_mutex);
}

int queue_picture(VideoState *is, AVFrame *pFrame, double pts, int iIndex)
{
    VideoPicture *vp;
    AVPicture pict;

    /* wait until we have space for a new pic */
    SDL_LockMutex(is->pictq_mutex);
    while( is->pictq_size >= VIDEO_PICTURE_QUEUE_SIZE && !is->quit ) {
        SDL_CondWait(is->pictq_cond, is->pictq_mutex);
    }
    SDL_UnlockMutex(is->pictq_mutex);

    if( is->quit ) {
        return -1;
    }

    // windex is set to 0 initially
    vp = &is->pictq[is->pictq_windex];

    /* allocate or resize the buffer! */
    if( !vp->bmp ||
        vp->width != is->video_st->codec->width ||
        vp->height != is->video_st->codec->height ) {
            SDL_Event event;

            vp->allocated = 0;
            /* we have to do it in the main thread */
            event.type = FF_ALLOC_EVENT;
            event.user.data1 = is;
            SDL_PushEvent(&event);

            /* wait until we have a picture allocated */
            SDL_LockMutex(is->pictq_mutex);
            while( !vp->allocated && !is->quit ) {
                SDL_CondWait(is->pictq_cond, is->pictq_mutex);
            }
            SDL_UnlockMutex(is->pictq_mutex);
            if( is->quit ) {
                return -1;
            }
    }

    /* We have a place to put our picture on the queue */
    /* If we are skipping a frame, do we set this to null
    but still return vp->allocated = 1? */

    if( vp->bmp ) {
        SDL_LockYUVOverlay(vp->bmp);

        /* point pict at the queue */

        pict.data[0] = vp->bmp->pixels[0];
        pict.data[1] = vp->bmp->pixels[2];
        pict.data[2] = vp->bmp->pixels[1];

        pict.linesize[0] = vp->bmp->pitches[0];
        pict.linesize[1] = vp->bmp->pitches[2];
        pict.linesize[2] = vp->bmp->pitches[1];

        // Convert the image into YUV format that SDL uses
        sws_scale
            (
            is->sws_ctx,
            (uint8_t const * const *)pFrame->data,
            pFrame->linesize,
            0,
            is->video_st->codec->height,
            pict.data,
            pict.linesize
            );

        SDL_UnlockYUVOverlay(vp->bmp);
        vp->pts = pts;

        // BD
        vp->type = pFrame->pict_type;
        vp->iIndex = iIndex;
        // ED

        /* now we inform our display thread that we have a pic ready */
        if( ++ is->pictq_windex == VIDEO_PICTURE_QUEUE_SIZE ) {
            is->pictq_windex = 0;
        }
        SDL_LockMutex(is->pictq_mutex);
        is->pictq_size++;
        SDL_UnlockMutex(is->pictq_mutex);
    }

    return 0;
}

/*
 * 這裏就是簡單的計算video_clock的值
 */
double synchronize_video(VideoState *is, AVFrame *src_frame, double pts)
{
    double frame_delay;

    if( pts != 0 ) {
        /* if we have pts, set video clock to it */
        is->video_clock = pts;
    } else {
        /* if we aren't given a pts, set it to the clock */
        pts = is->video_clock;
    }

    /* update the video clock */
    // 若視頻幀率爲25fps, 則1幀耗時0.04s, 而這裏time_base的值爲1/50, 即0.02秒
    frame_delay = av_q2d(is->video_st->codec->time_base);

    /* if we are repeating a frame, adjust clock accordingly */
    frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);

    is->video_clock += frame_delay;
    
    return pts;
}

uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;

/* These are called whenever we allocate a frame
* buffer. We use this to store the global_pts in
* a frame at the time it is allocated.
*/
int our_get_buffer(struct AVCodecContext *c, AVFrame *pic, int flags)
{
    int ret = avcodec_default_get_buffer(c, pic);
    uint64_t *pts = (uint64_t *)av_malloc(sizeof(uint64_t));
    *pts = global_video_pkt_pts;
    pic->opaque = pts;

    return ret;
}
void our_release_buffer(struct AVCodecContext *c, AVFrame *pic)
{
    if( pic ) {
        av_freep(&pic->opaque);
    }

    avcodec_default_release_buffer(c, pic);
}

int video_thread(void *arg)
{
    VideoState *is = (VideoState *)arg;
    AVPacket pkt1, *packet = &pkt1;
    int frameFinished;
    AVFrame *pFrame;
    double pts;

    pFrame = av_frame_alloc();

    for( ; ; ) {
        if( packet_queue_get(&is->videoq, packet, 1) < 0 ) {
            // means we quit getting packets
            break;
        }
        pts = 0;

        // Save global pts to be stored in pFrame in first call
        global_video_pkt_pts = packet->pts;

        // Decode video frame
        int iRet = avcodec_decode_video2(is->video_st->codec, pFrame, &frameFinished, packet);
        if( iRet < 0 ) {
            // error
            int a=2;
            int b=a;
        } else if( iRet == 0 ) {
            // no frame could be decompressed
            int a=2;
            int b=a;
        } else {
            // ok
        }

        // BD
        LogPrintfA("[%05d] Packet:%d, type: %s, dts: %I64d, pts: %I64d\n", ::GetCurrentThreadId(),
                ++ g_iIndex_video_pkt, GetPictureTypeString(pFrame->pict_type).c_str(),
                packet->dts, packet->pts);
        // ED

        if( packet->dts == AV_NOPTS_VALUE
            && pFrame->opaque
            && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE ) {
                pts = *(uint64_t *)pFrame->opaque;
        } else if( packet->dts != AV_NOPTS_VALUE ) {
            pts = packet->dts;
        } else {
            pts = 0;
        }
        // 根據pts來計算一楨在整個視頻中的時間位置
        pts *= av_q2d(is->video_st->time_base);
        
        // BD
        AVRational a1 = is->video_st->r_frame_rate;
        int64_t ptsBst = av_frame_get_best_effort_timestamp(pFrame);
        double ptsOld = pts;
        if( AV_PICTURE_TYPE_I == pFrame->pict_type ) {
            int a=2;
            int b=a;
        }
        // ED

        // Did we get a video frame?
        if( frameFinished ) {
            pts = synchronize_video(is, pFrame, pts);

            // BD
            if( ptsOld != pts ) {
                int a=2;
                int b=a;
            }
            //LogPrintfA("[%05d] Packet:%d, truely pts: %f\n", ::GetCurrentThreadId(), g_iIndex_video_pkt, pts);
            // ED
            
            if( queue_picture(is, pFrame, pts, g_iIndex_video_pkt) < 0 ) {
                break;
            }
        }
        av_free_packet(packet);
    }

    av_free(pFrame);
    return 0;
}

int stream_component_open(VideoState *is, int stream_index)
{
    AVFormatContext *pFormatCtx = is->pFormatCtx;
    AVCodecContext *codecCtx = NULL;
    AVCodec *codec = NULL;
    AVDictionary *optionsDict = NULL;
    SDL_AudioSpec wanted_spec, spec;

    if(stream_index < 0 || stream_index >= pFormatCtx->nb_streams) {
        return -1;
    }

    // Get a pointer to the codec context for the video stream
    codecCtx = pFormatCtx->streams[stream_index]->codec;

    if( codecCtx->codec_type == AVMEDIA_TYPE_AUDIO ) {
        // Set audio settings from codec info
        wanted_spec.freq = codecCtx->sample_rate;
        wanted_spec.format = AUDIO_S16SYS;
        wanted_spec.channels = codecCtx->channels;
        wanted_spec.silence = 0;
        wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
        wanted_spec.callback = audio_callback;
        wanted_spec.userdata = is;

        if( SDL_OpenAudio(&wanted_spec, &spec) < 0 ) {
            fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
            return -1;
        }
        is->audio_hw_buf_size = spec.size;
    }
    codec = avcodec_find_decoder(codecCtx->codec_id);

    if( !codec || (avcodec_open2(codecCtx, codec, &optionsDict) < 0) ) {
        fprintf(stderr, "Unsupported codec!\n");
        return -1;
    }

    switch( codecCtx->codec_type ) {
    case AVMEDIA_TYPE_AUDIO:
        {
            is->audioStream = stream_index;
            is->audio_st = pFormatCtx->streams[stream_index];
            is->audio_buf_size = 0;
            is->audio_buf_index = 0;
            memset(&is->audio_pkt, 0, sizeof(is->audio_pkt));
            packet_queue_init(&is->audioq);
            SDL_PauseAudio(0);
        }
        break;
    case AVMEDIA_TYPE_VIDEO:
        {
            is->videoStream = stream_index;
            is->video_st = pFormatCtx->streams[stream_index];

            is->frame_timer = (double)av_gettime() / 1000000.0;
            is->frame_last_delay = 40e-3;

            // BD
            LogPrintfA("初始化: frame_timer: %f, frame_last_delay: %f\n", is->frame_timer, is->frame_last_delay);
            // ED

            packet_queue_init(&is->videoq);
            is->video_tid = SDL_CreateThread(video_thread, is);
            is->sws_ctx =
                sws_getContext
                (
                is->video_st->codec->width,
                is->video_st->codec->height,
                is->video_st->codec->pix_fmt,
                is->video_st->codec->width,
                is->video_st->codec->height,
                PIX_FMT_YUV420P,
                SWS_BILINEAR,
                NULL,
                NULL,
                NULL
                );
            codecCtx->get_buffer2 = our_get_buffer;
            codecCtx->release_buffer = our_release_buffer;
        }
        break;
    default:
        break;
    }

    return 0;
}

int decode_interrupt_cb(void *opaque) {
    return (global_video_state && global_video_state->quit);
}

int decode_thread(void *arg)
{
    VideoState *is = (VideoState *)arg;
    AVFormatContext *pFormatCtx = NULL;
    AVPacket pkt1, *packet = &pkt1;

    AVDictionary *io_dict = NULL;
    AVIOInterruptCB callback;

    int video_index = -1;
    int audio_index = -1;
    int i;

    is->videoStream = -1;
    is->audioStream = -1;

    global_video_state = is;
    // will interrupt blocking functions if we quit!
    callback.callback = decode_interrupt_cb;
    callback.opaque = is;
    if( avio_open2(&is->io_context, is->filename, 0, &callback, &io_dict) ) {
        fprintf(stderr, "Unable to open I/O for %s\n", is->filename);
        return -1;
    }

    // Open video file
    if( avformat_open_input(&pFormatCtx, is->filename, NULL, NULL) != 0 ) {
        return -1; // Couldn't open file
    }

    is->pFormatCtx = pFormatCtx;

    // Retrieve stream information
    if( avformat_find_stream_info(pFormatCtx, NULL) < 0 ) {
        return -1; // Couldn't find stream information
    }

    // Dump information about file onto standard error
    av_dump_format(pFormatCtx, 0, is->filename, 0);

    // Find the first video stream
    for( i = 0; i < pFormatCtx->nb_streams; i++ ) {
        if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO &&
            video_index < 0 ) {
                video_index = i;
        }
        if( pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO &&
            audio_index < 0 ) {
                audio_index = i;
        }
    }
    if( audio_index >= 0 ) {
        stream_component_open(is, audio_index);
    }
    if( video_index >= 0 ) {
        stream_component_open(is, video_index);
    }

    if( is->videoStream < 0 || is->audioStream < 0 ) {
        fprintf(stderr, "%s: could not open codecs\n", is->filename);
        goto fail;
    }

    // Begin -- set video size by oldmtn
    // Make a screen to put our video
    int width = pFormatCtx->streams[video_index]->codec->width;
    int height = pFormatCtx->streams[video_index]->codec->height;
    screen = SDL_SetVideoMode(width, height, 0, 0);
    if( !screen ) {
        fprintf(stderr, "SDL: could not set video mode - exiting\n");
        exit(1);
    }
    // End -- set video size by oldmtn

    // main decode loop

    for( ; ; ) {
        if( is->quit ) {
            break;
        }

        // seek stuff goes here
        if( is->audioq.size > MAX_AUDIOQ_SIZE ||
            is->videoq.size > MAX_VIDEOQ_SIZE ) {
                SDL_Delay(10);
                continue;
        }

        if( av_read_frame(is->pFormatCtx, packet) < 0 ) {
            if( is->pFormatCtx->pb->error == 0 ) {
                SDL_Delay(100); /* no error; wait for user input */
                continue;
            } else {
                break;
            }
        }

        // Is this a packet from the video stream?
        if( packet->stream_index == is->videoStream ) {
            packet_queue_put(&is->videoq, packet);
        } else if( packet->stream_index == is->audioStream ) {
            packet_queue_put(&is->audioq, packet);
        } else {
            av_free_packet(packet);
        }
    }

    /* all done - wait for it */
    while( !is->quit ) {
        SDL_Delay(100);
    }

fail:
    {
        SDL_Event event;
        event.type = FF_QUIT_EVENT;
        event.user.data1 = is;
        SDL_PushEvent(&event);
    }
    return 0;
}

int _tmain() {

    SDL_Event       event;

    VideoState      *is;

    is = (VideoState *)av_mallocz(sizeof(VideoState));

    //char szFile[] = "cuc_ieschool.flv";
    char szFile[] = "edu.flv";
    //char szFile[] = "song.flv";
    //char szFile[] = "drj.mkv";
    //char szFile[] = "city.mkv";

    // Register all formats and codecs
    av_register_all();

    if(SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER)) {
        fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());
        exit(1);
    }

    av_strlcpy(is->filename, szFile, 1024);

    is->pictq_mutex = SDL_CreateMutex();
    is->pictq_cond = SDL_CreateCond();

    schedule_refresh(is, 40);

    is->parse_tid = SDL_CreateThread(decode_thread, is);
    if(!is->parse_tid) {
        av_free(is);
        return -1;
    }

    for( ; ; ) {
        SDL_WaitEvent(&event);
        switch(event.type) {
        case FF_QUIT_EVENT:
        case SDL_QUIT:
            is->quit = 1;
            /*
            * If the video has finished playing, then both the picture and
            * audio queues are waiting for more data.  Make them stop
            * waiting and terminate normally.
            */
            SDL_CondSignal(is->audioq.cond);
            SDL_CondSignal(is->videoq.cond);
            SDL_Quit();
            exit(0);
            break;
        case FF_ALLOC_EVENT:
            alloc_picture(event.user.data1);
            break;
        case FF_REFRESH_EVENT:
            video_refresh_timer(event.user.data1);
            break;
        default:
            break;
        }
    }

    return 0;
}


#endif // TUTORIAL_05





/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

PTS和DTS

接觸FFMPEG應用程序時間不長,一共8個tutorial,現在看到了第5個,花的時間最長,理解也是最難的。裏面首先把電影文件分爲audio和video,其中每個packet都有相應的pts,audio是通過聲卡時鐘自動同步,audio的pts的作用是來同步視頻的。

audio和video都有一個統計播放總時間的變量,即audio_clock和video_clock,ffmpeg-tutorial05就是通過比較這兩個clock來調整當前視頻幀的延遲時間,從而達到音視頻同步的效果的。

幸運的是,音頻和視頻流都有一些關於以多快速度和什麼時間來播放它們的信息在裏面。音頻流有采樣,視頻流有每秒的幀率。然而,如果我們只是簡單的通過數幀和乘以幀率的方式來同步視頻,那麼就很有可能會失去同步。於是作爲一種補充,在流中的包有種叫做DTS(解碼時間戳)和PTS(顯示時間戳)的機制。爲了這兩個參數,你需要了解電影存放的方式。像MPEG等格式,使用被叫做B幀(B表示雙向bidrectional)的方式。另外兩種幀被叫做I幀和P幀(I表示關鍵幀,P表示預測幀)。I幀包含了某個特定的完整圖像。P幀依賴於前面的I幀和P幀並且使用比較或者差分的方式來編碼。B幀與P幀有點類似,但是它是依賴於前面和後面的幀的信息的。這也就解釋了爲什麼我們可能在調用avcodec_decode_video以後會得不到一幀圖像。

所以對於一個電影,幀是這樣來顯示的:I B B P。現在我們需要在顯示B幀之前知道P幀中的信息。因此,幀可能會按照這樣的方式來存儲:IPBB。這就是爲什麼我們會有一個解碼時間戳和一個顯示時間戳的原因。解碼時間戳告訴我們什麼時候需要解碼,顯示時間戳告訴我們什麼時候需要顯示。所以,在這種情況下,我們的流可以是這樣的:

PTS: 1 4 2 3

DTS: 1 2 3 4

Stream: I P B B

通常PTS和DTS只有在流中有B幀的時候會不同。

我跟蹤代碼的結果是,並不是每個AVPacket都有確定的PTS。

當我們調用av_read_frame()得到一個包的時候,PTS和DTS的信息也會保存在包中。但是我們真正想要的PTS是我們剛剛解碼出來的原始幀的PTS,這樣我們才能知道什麼時候來顯示它。然而,我們從avcodec_decode_video()函數中得到的幀只是一個AVFrame,其中並沒有包含有用的PTS值(注意:AVFrame並沒有包含時間戳信息,但當我們等到幀的時候並不是我們想要的樣子)。然而,ffmpeg重新排序包以便於被avcodec_decode_video()函數處理的包的DTS可以總是與其返回的PTS相同。但是,另外的一個警告是:我們也並不是總能得到這個信息。

不用擔心,因爲有另外一種辦法可以找到幀的PTS,我們可以讓程序自己來重新排序包。我們保存一幀的第一個包的PTS:這將作爲整個這一幀的 PTS。我們 可以通過函數avcodec_decode_video()來計算出哪個包是一幀的第一個包。怎樣實現呢?任何時候當一個包開始一幀的時候,avcodec_decode_video()將調用一個函數來爲一幀申請一個緩衝。當然,ffmpeg允許我們重新定義那個分配內存的函數。所以我們製作了一個新的函數來保存一個包的時間戳。

當然,儘管那樣,我們可能還是得不到一個正確的時間戳。我們將在後面處理這個問題。

同步

現在,知道了什麼時候來顯示一個視頻幀真好,但是我們怎樣來實際操作呢?這裏有個主意:當我們顯示了一幀以後,我們計算出下一幀顯示的時間。然後我們簡單的設置一個新的定時器來。你可能會想,我們檢查下一幀的PTS值而不是系統時鐘來看超時是否會到。這種方式可以工作,但是有兩種情況要處理。

首先,要知道下一個PTS是什麼。現在我們能添加視頻速率到我們的PTS中--太對了!然而,有些電影需要幀重複。這意味着我們重複播放當前的幀。這將導致程序顯示下一幀太快了。所以我們需要計算它們。

第二,正如程序現在這樣,視頻和音頻播放很歡快,一點也不受同步的影響。如果一切都工作得很好的話,我們不必擔心。但是,你的電腦並不是最好的,很多視頻文件也不是完好的。所以,我們有三種選擇:同步音頻到視頻,同步視頻到音頻,或者都同步到外部時鐘(例如你的電腦時鐘)。從現在開始,我們將同步視頻到音頻。

寫代碼:獲得幀的時間戳

現在讓我們到代碼中來做這些事情。我們將需要爲我們的大結構體添加一些成員,但是我們會根據需要來做。首先,讓我們看一下視頻線程。記住,在這裏我們得到了解碼線程輸出到隊列中的包。這裏我們需要的是從avcodec_decode_video函數中得到幀的時間戳。我們討論的第一種方式是從上次處理的包中得到DTS,這是很容易的:

double pts;

for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets

break;

}

pts = 0;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec,

pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;

}

pts *= av_q2d(is->video_st->time_base);//這裏就是1/frame_rate這裏是1/25

如果我們得不到PTS就把它設置爲0。

好,那是很容易的。但是我們所說的如果包的DTS不能幫到我們,我們需要使用這一幀的第一個包的PTS。我們通過讓ffmpeg使用我們自己的申請幀程序來實現。下面的是函數的格式:

int get_buffer(struct AVCodecContext *c, AVFrame *pic);

void release_buffer(struct AVCodecContext *c, AVFrame *pic);

申請函數沒有告訴我們關於包的任何事情,所以我們要自己每次在得到一個包的時候把PTS保存到一個全局變量中去。我們自己以讀到它。然後,我們把值保存到AVFrame結構體難理解的變量中去。所以一開始,這就是我們的函數:

uint64_t global_video_pkt_pts = AV_NOPTS_VALUE;

//這裏的AV_NOPTS_VALUE相當於NULL,out_get_buffer和our_release_buffer是自己定義的,賦給AVCodecContext的get_buffer和release_buffer,這樣,程序在執行ffmpeg的get_buffer和release_buffer時就會執行到用戶自己定義的函數體中。

int our_get_buffer(struct AVCodecContext *c, AVFrame *pic) {

int ret = avcodec_default_get_buffer(c, pic);

uint64_t *pts = av_malloc(sizeof(uint64_t));

*pts = global_video_pkt_pts;

pic->opaque = pts;

return ret;

}

void our_release_buffer(struct AVCodecContext *c, AVFrame *pic) {

if(pic) av_freep(&pic->opaque);

avcodec_default_release_buffer(c, pic);

}

函數avcodec_default_get_buffer和avcodec_default_release_buffer是ffmpeg中默認的申請緩衝的函數。函數av_freep是一個內存管理函數,它不但把內存釋放而且把指針設置爲NULL。

現在到了我們流打開的函數(stream_component_open),我們添加這幾行來告訴ffmpeg如何去做:

codecCtx->get_buffer = our_get_buffer;

codecCtx->release_buffer = our_release_buffer;

現在我們必需添加代碼來保存PTS到全局變量中,然後在需要的時候來使用它。我們的代碼現在看起來應該是這樣子:

for(;;) {

if(packet_queue_get(&is->videoq, packet, 1) < 0) {

// means we quit getting packets

break;

}

pts = 0;

// Save global pts to be stored in pFrame in first call

global_video_pkt_pts = packet->pts;

// Decode video frame

len1 = avcodec_decode_video(is->video_st->codec, pFrame, &frameFinished,

packet->data, packet->size);

if(packet->dts == AV_NOPTS_VALUE

&& pFrame->opaque && *(uint64_t*)pFrame->opaque != AV_NOPTS_VALUE) {

pts = *(uint64_t *)pFrame->opaque;

} else if(packet->dts != AV_NOPTS_VALUE) {

pts = packet->dts;

} else {

pts = 0;

}

pts *= av_q2d(is->video_st->time_base);

技術提示:你可能已經注意到我們使用int64來表示PTS。這是因爲PTS是以整型來保存的。這個值是一個時間戳相當於時間的度量,用來以流的 time_base爲單位進行時間度量。例如,如果一個流是24幀每秒,值爲42的PTS表示這一幀應該排在第42個幀的位置如果我們每秒有24幀(這裏並不完全正確)。

我們可以通過除以幀率來把這個值轉化爲秒。流中的time_base值表示1/framerate(對於固定幀率來說),所以得到了以秒爲單位的PTS,我們需要乘以time_base。

寫代碼:使用PTS來同步

現在我們得到了PTS。我們要注意前面討論到的兩個同步問題。我們將定義一個函數叫做synchronize_video,它可以更新同步的 PTS。這個函數也能最終處理我們得不到PTS的情況。同時我們要知道下一幀的時間以便於正確設置刷新速率。我們可以使用內部的反映當前視頻已經播放時間的時鐘 video_clock來完成這個功能。我們把這些值添加到大結構體中。

typedef struct VideoState {

double video_clock; ///

下面的是函數synchronize_video,它可以很好的自我註釋:

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

double frame_delay;

if(pts != 0) {

is->video_clock = pts;

} else {

pts = is->video_clock;

}

frame_delay = av_q2d(is->video_st->codec->time_base);

frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);

is->video_clock += frame_delay;

return pts;

}

你也會注意到我們也計算了重複的幀。

現在讓我們得到正確的PTS並且使用queue_picture來隊列化幀,添加一個新的時間戳參數pts:

// Did we get a video frame?

if(frameFinished) {

pts = synchronize_video(is, pFrame, pts);

if(queue_picture(is, pFrame, pts) < 0) {

break;

}

}

對於queue_picture來說唯一改變的事情就是我們把時間戳值pts保存到VideoPicture結構體中,我們必需添加一個時間戳變量到結構體中並且添加一行代碼:

typedef struct VideoPicture {

...

double pts;

}

int queue_picture(VideoState *is, AVFrame *pFrame, double pts) {

... stuff ...

if(vp->bmp) {

... convert picture ...

vp->pts = pts;

... alert queue ...

}

現在我們的圖像隊列中的所有圖像都有了正確的時間戳值,所以讓我們看一下視頻刷新函數。你會記得上次我們用80ms的刷新時間來欺騙它。那麼,現在我們將會算出實際的值。

我們的策略是通過簡單計算前一幀和現在這一幀的時間戳來預測出下一個時間戳的時間。同時,我們需要同步視頻到音頻。我們將設置一個音頻時間 audio clock;一個內部值記錄了我們正在播放的音頻的位置。就像從任意的mp3播放器中讀出來的數字一樣。既然我們把視頻同步到音頻,視頻線程使用這個值來算出是否太快還是太慢。

我們將在後面來實現這些代碼;現在我們假設我們已經有一個可以給我們音頻時間的函數get_audio_clock。一旦我們有了這個值,我們在音頻和視頻失去同步的時候應該做些什麼呢?簡單而有點笨的辦法是試着用跳過正確幀或者其它的方式來解決。作爲一種替代的手段,我們會調整下次刷新的值;如果時間戳太落後於音頻時間,我們加倍計算延遲。如果時間戳太領先於音頻時間,我們將盡可能快的刷新。既然我們有了調整過的時間和延遲,我們將把它和我們通過 frame_timer計算出來的時間進行比較。這個幀時間frame_timer將會統計出電影播放中所有的延時。換句話說,這個 frame_timer就是指我們什麼時候來顯示下一幀。我們簡單的添加新的幀定時器延時,把它和電腦的系統時間進行比較,然後使用那個值來調度下一次刷新。這可能有點難以理解,所以請認真研究代碼:

void video_refresh_timer(void *userdata) {

VideoState *is = (VideoState *)userdata;

VideoPicture *vp;

double actual_delay, delay, sync_threshold, ref_clock, diff;

if(is->video_st) {

if(is->pictq_size == 0) {

schedule_refresh(is, 1);

} else {

vp = &is->pictq[is->pictq_rindex];

delay = vp->pts - is->frame_last_pts;

if(delay <= 0 || delay >= 1.0) {

delay = is->frame_last_delay;

}

is->frame_last_delay = delay;

is->frame_last_pts = vp->pts;

ref_clock = get_audio_clock(is);

diff = vp->pts - ref_clock;

sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;

if(fabs(diff) < AV_NOSYNC_THRESHOLD) {

if(diff <= -sync_threshold) {

delay = 0;

} else if(diff >= sync_threshold) {

delay = 2 * delay;

}

}

is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);

if(actual_delay < 0.010) {

actual_delay = 0.010;

}

schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));

video_display(is);

if(++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE) {

is->pictq_rindex = 0;

}

SDL_LockMutex(is->pictq_mutex);

is->pictq_size--;

SDL_CondSignal(is->pictq_cond);

SDL_UnlockMutex(is->pictq_mutex);

}

} else {

schedule_refresh(is, 100);

}

}

is->frame_timer表示下一幀要刷新(播放)的時刻。
is->frame_timer = (double)av_gettime() / 1000000.0;記錄播放的初始時刻。然後在每次播放之前首先計算要播放的那幀的時刻,算好了時間纔好設定定時器進行刷新。
actual_delay = is->frame_timer - (av_gettime() / 1000000.0);表示具體設定需要延遲的時間(is->frame_timer是將要播放的時刻,av_gettime() / 1000000.0是當前的時刻,它們的差值就是實際要延遲的時間)。

/*********************************************************************

這裏的is->frame_timer需要注意,它第一次賦值時是在stream_compoment_open中:

is->frame_timer = (double)av_gettime() / 1000000.0; 獲得系統時間作爲第一幀播放的初始時刻,之後每一幀延遲delay都被累加進來,因此is->frame_timer就是當前幀的播放時間。

is->frame_timer += delay;

首先程序將幀播放時間與音頻時間比較:

diff = vp->pts - ref_clock;

再與系統時間比較:

is->frame_timer += delay;

actual_delay = is->frame_timer - (av_gettime() / 1000000.0);

我們在這裏做了很多檢查:首先,我們保證現在的時間戳和上一個時間戳之間的處以delay是有意義的。如果不是的話,我們就猜測着用上次的延遲。接着,我們有一個同步閾值,因爲在同步的時候事情並不總是那麼完美的。在ffplay中使用0.01作爲它的值。我們也保證閾值不會比時間戳之間的間隔短。最後,我們把最小的刷新值設置爲10毫秒。

(這句不知道應該放在哪裏)事實上這裏我們應該跳過這一幀,但是我們不想爲此而煩惱。

我們給大結構體添加了很多的變量,所以不要忘記檢查一下代碼。同時也不要忘記在函數streame_component_open中初始化幀時間frame_timer和前面的幀延遲frame delay:

av_gettime()得到的時間是以徽秒爲單位的,所以要除以1000000轉換爲S。

is->frame_timer = (double)av_gettime() / 1000000.0;

is->frame_last_delay = 40e-3;

同步:聲音時鐘

現在讓我們看一下怎樣來得到聲音時鐘。我們可以在聲音解碼函數audio_decode_frame中更新時鐘時間。現在,請記住我們並不是每次調用這個函數的時候都在處理新的包,所以有我們要在兩個地方更新時鐘。第一個地方是我們得到新的包的時候:我們簡單的設置聲音時鐘爲這個包的時間戳。然後,如果一個包裏有許多幀,我們通過樣本數和採樣率來計算,所以當我們得到包的時候:

if(pkt->pts != AV_NOPTS_VALUE) {
is->audio_clock = av_q2d(is->audio_st->time_base)*pkt->pts;
}

然後當我們處理這個包的時候:

pts = is->audio_clock;
*pts_ptr = pts;
n = 2 * is->audio_st->codec->channels;
is->audio_clock += (double)data_size /
(double)(n * is->audio_st->codec->sample_rate);
一點細節:臨時函數被改成包含pts_ptr,所以要保證你已經改了那些。這時的pts_ptr是一個用來通知audio_callback函數當前聲音包的時間戳的指針。這將在下次用來同步聲音和視頻。

現在我們可以最後來實現我們的get_audio_clock函數。它並不像得到is->audio_clock值那樣簡單。注意我們會在每次處理它的時候設置聲音時間戳,但是如果你看了audio_callback函數,它花費了時間來把數據從聲音包中移到我們的輸出緩衝區中。這意味着我們聲音時鐘中記錄的時間比實際的要早太多。所以我們必須要檢查一下我們還有多少沒有寫入。下面是完整的代碼:
double get_audio_clock(VideoState *is) {
double pts;
int hw_buf_size, bytes_per_sec, n;
pts = is->audio_clock;
hw_buf_size = is->audio_buf_size - is->audio_buf_index;
bytes_per_sec = 0;
n = is->audio_st->codec->channels * 2;
if(is->audio_st) {
bytes_per_sec = is->audio_st->codec->sample_rate * n;
}
if(bytes_per_sec) {
pts -= (double)hw_buf_size / bytes_per_sec;
}
return pts;
}






















發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章