在上一篇中,基本實現了音視頻的播放,但是音頻與視頻的播放完全不同步,就是一個簡單的延時,不忍直視!爲了寫好這一篇音視頻播放同步,我將從源頭分析,然後一步步想辦法如何實現同步。
音視頻同步基本知識點
在解決音視頻播放同步前,有一些基本的知識點我需要說明一下。
音頻採樣、編碼、播放
- 採樣:正常人聽覺的頻率範圍大約在20Hz~20kHz之間,根據奈奎斯特採樣理論,爲了保證聲音不失真,採樣頻率應該在人耳所能聽到聲音頻率最大值的2倍,那麼40KHz的採樣率已經足夠,但是爲了保證人耳聽到的聲音質量不降低,業界一般採用44.1KHz的採樣率,即每秒採樣44100次,更精確的採樣率爲48KHz
- 編碼:聲音的採樣過程其實是一個模擬信號轉爲數字信號的過程,數字信號必然有一個範圍,可以用1字節、2字節、4字節表示一個採樣點的數值。業界也一般採用2字節(16bit),來表示一個採樣點數值,是一個16位有符號的整數,表示範圍是-32768~32767,總計65536種數值。
我們聽到的聲音還有聲道一說,常見的爲左右聲道,這在FFmpeg裏面稱之爲聲道佈局,常見的有
AV_CH_LAYOUT_STEREO:普通音響,即左、右佈局
AV_CH_LAYOUT_2POINT1:普通音響加低音,即左、右佈局,加低音炮
AV_CH_LAYOUT_SURROUND:環繞聲,左、右、前中佈局
AV_CH_LAYOUT_5POINT1:環繞聲 + 左邊際 + 右邊際 + 低音炮
就常見的CD音頻左右聲道來說,1秒採樣44100次,每個採樣點16bit,2個通道,產生數據:
44100x16x2bits,這就是聲音的原始數據,稱爲脈衝調製數據PCM(PulseCodeModulation),在保存PCM數據時,一般按照聲道依次排列:(左右左右左右…)
描述一個PCM格式的數據需要一下幾個概念:採樣格式(即bit位數)、採樣率、聲道數。
PCM數據的存儲還可以分爲小端與大端格式,常見的是小端格式,
如果直接保存PCM聲音原始數據,按照CD格式的音頻數據,1分鐘可以產生10M左右的數據,顯然偏大,因此需要對PCM數據進行編碼,編碼的目的就是壓縮數據。
這裏簡單說明一下常用的MP3與AAC編碼的特點。
MP3:編碼一幀,一般是1152個採樣點,這樣其數據大小是1152x2x2=4608字節
AAC:編碼一幀,一般是1024個採樣點,這樣其數據大小是1024x2x2=4096字節 - 播放:理論上只要音頻的播放與採樣率一致,就可以完美的還原聲音,但是因爲編碼一幀,就需要按幀進行解碼,MP3播放這一幀耗時爲:1152 / 44100 = 26.122449ms,AAC播放一幀耗時爲:1024 / 44100 = 23.2199546ms,計時系統很難達到如此精確的計時,必然有一定的誤差,我們只能以最快的速度將數據傳遞給播放設備,否則中間延遲過長,就會聽到聲音卡頓。
視頻採樣、編碼、播放
- 採樣:視頻的採樣,通過圖像傳感器採集到一副完整的圖像,圖像格式可能是RGB或者YUV格式,一副圖像的大小基本都是以MB爲單位,而爲了看到動態的視頻圖像,必須在1秒內採樣24幅圖像,然後在1秒鐘內播放出來,人眼纔不會感覺到圖像的卡頓,因此如果不對圖像進行編碼壓縮,1個90分鐘的電影,如果按照未編碼的RGB或者YUV格式存儲,將需要海量的存儲空間
- 編碼:最常見的是H264格式,H264編碼格式的視頻幀有I幀、P幀、B幀、GOP等概念,具體可以參見我的博客:H264幀格式解析
- 播放:視頻的播放,需要將視頻中的H264解碼,然後按採樣率fps播放,即每秒的採樣次數,例如每秒24幀,1幀播放時間爲1/24=41.67ms,更常見的fps有25fps(一幀播放時間爲40ms),30fps(一幀播放時間爲33.3ms)
獲取播放文件的信息
上面一些音視頻的基本知識點,是解決音視頻播放同步的主要因素,因此必須先通過媒體文件,獲取裏面的音頻與視頻的信息,根據這些信息才能做好同步操作。那麼如何獲得這些信息呢?
AVFormatContext *pFormatCtx;
pFormatCtx = avformat_alloc_context();
avformat_open_input(&pFormatCtx, filepath, NULL, NULL);
avformat_find_stream_info(pFormatCtx,NULL)
av_dump_format(pFormatCtx, 0, filepath, 0);
//以下是函數av_dump_format輸出的信息
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bootloader.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2017-12-29T09:16:47.000000Z
Duration: 00:14:10.67, start: 0.000000, bitrate: 1128 kb/s
Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1024x768, 808 kb/s, 8 fps, 8 tbr, 16 tbn, 16 tbc (default)
Metadata:
creation_time : 2017-12-29T09:16:47.000000Z
handler_name : Alias Data Handler
encoder : AVC Coding
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
Metadata:
creation_time : 2017-12-29T09:16:47.000000Z
handler_name : Alias Data Handler
我們主要關注以下幾點信息
-
文件時長:Duration: 00:14:10.67,此信息位於結構體
AVFormatContext
的duration
成員,其實還可以獲取其他信息例如bit_rate、packet_size -
視頻流:Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1024x768, 808 kb/s, 8 fps, 8 tbr, 16 tbn, 16 tbc (default)
-
音頻流:Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
那麼這些數據都是從哪裏得到的呢?
在獲得並根據多媒體文件更新一個AVFormatContext
結構體變量之後,就可以在此結構的AVStream **streams
成員中查找音視頻流,並獲得音視頻流的各種信息
獲取音頻信息
獲取音頻相關信息主要依靠struct AVCodecContext
結構體,此結構體的變量位於AVStream
結構中,當在AVFormatContext
結構體的AVStream **streams
成員中查找到音頻流之後,就可以用以下方式獲取音頻信息:
- 音頻編碼方式:
pFormatCtx->streams[AudioIndex]->codec->codec_id
,這是一個枚舉變量 - 音頻採樣率:
pFormatCtx->streams[AudioIndex]->codec->sample_rate
, - 一個音頻編碼幀的採樣個數:
pFormatCtx->streams[AudioIndex]->codec->frame_size
- 音頻通道數:
pFormatCtx->streams[AudioIndex]->codec->channels
- 採樣格式:
pFormatCtx->streams[AudioIndex]->codec->sample_fmt
,這是一個枚舉變量
獲取視頻信息
獲取視頻相關信息與音頻類似,當在AVFormatContext
結構體的AVStream **streams
成員中查找到視頻流之後,就可以用以下方式獲取視頻信息:
- 視頻頻編碼方式:
pFormatCtx->streams[VideoIndex]->codec->codec_id
,這是一個枚舉變量 - 視頻分辨率:
pFormatCtx->streams[VideoIndex]->codec->width / height
, - 視頻幀率:
pFormatCtx->streams[VideoIndex]->codec->framerate
,這是一個AVRational
類型的變量,次結構用來表示一個分數,其成員num
表示分子,den
成員表示分母,這個結構在以下部分會常用到
同步的分析
通過以上的步驟,分別獲取多媒體文件的音頻與視頻信息之後,就可以進行解碼並播放。理論上只需要分別按照各自的時間要求播放音頻與視頻,他們本身應該就是同步的。假設一個多媒體文件的音頻流爲AAC編碼,2聲道,格式爲16bit,採樣率44.1KHz,視頻流爲H264編碼,幀率爲25fps,理論播放同步如下:
時間軸 | 0 | 23.2 | 40 | 46.4 | 69.6 | 80 | 92.8 | 116 | 20 | … |
---|---|---|---|---|---|---|---|---|---|---|
音頻時間點 | 0 | 23.2 | 46.4 | 69.6 | 92.8 | 116 | … | |||
視頻時間點 | 0 | 40 | 80 | 120 | … |
理論上只要按照上面的時間點,各自播放音頻與視頻,就可以同步了,但實際上,音頻與視頻播放都分別需要經過解碼、重採樣、播放3個步驟,每個步驟的耗時不一樣,無法做到精確計時。
由此衍生出了3種同步的方法 :
- 以音頻爲基準,視頻向音頻同步
- 以視頻爲基準,音頻向視頻同步
- 以外部參考時鐘爲基準,音視頻向此時鐘同步
其實我更傾向於理論的方法,音頻與視頻各自播放互不打擾,從音視頻播放的特點來說,人的聽覺更爲敏感,稍微的停頓都可以聽出來,但是視覺就不一樣了,人的視覺有暫留的效應;
因此根據理論的同步方式,對音頻的播放不多加計算,儘快按照硬件所需數據的速度向硬件輸入播放數據,又因爲音頻編解碼的幀使用的解碼時間戳DTS
、播放時間戳PTS
永遠是一樣的,因此只需要按照順序進行解碼播放即可
對於視頻播放,由於H264編碼的視頻幀存在I幀、P幀、B幀,尤其是存在B幀的視頻、其解碼的順序與播放順序可能不一致,因此視頻播放要先按解碼順序解碼視頻,然後按照音頻播放的時間,在合適的時間點(PTS對應的時間)播放視頻,由於不能精確計時,視頻的早一點、遲一點,人的視覺幾乎感覺不到,只要誤差時間不超過視覺暫留的時間,並且誤差不要累積;這實際上就是以音頻爲基準,視頻向音頻同步的過程
由以上分析可以看出,同步不是一次性完成的,而是時時刻刻在進行的,直到播放完畢。
關於DTS與PTS:
- DTS(Decoding Time Stamp, 解碼時間戳),表示packet的解碼時間。
- PTS(Presentation Time Stamp, 顯示時間戳),表示packet解碼後數據的顯示時間。
- DTS與PTS的時間單位,在各自流的結構裏面使用
AVRational
類型的變量,time_base成員來表示,實際的時間需要乘以time_base所表示的單位時間
那麼如何獲取音視頻的DTS與PTS呢?
通過函數av_read_frame(pFormatCtx, Packet)讀取一個AVPacket,在此結構中保存有每一幀的DTS、PTS信息
音頻DTS與PTS
因爲音頻是順序播放,因此音頻中DTS和PTS是相同的。
printf("stream audio time_base.num:%d, time_base.den:%d, avg_frame_rate.num:%d, avg_frame_rate.den:%d, duration:%ld\n",
pFormatCtx->streams[AudioIndex]->time_base.num,
pFormatCtx->streams[AudioIndex]->time_base.den,
pFormatCtx->streams[AudioIndex]->avg_frame_rate.num,
pFormatCtx->streams[AudioIndex]->avg_frame_rate.den,
pFormatCtx->streams[AudioIndex]->duration);
//輸出:stream audio time_base.num:1, time_base.den:48000, avg_frame_rate.num:0, avg_frame_rate.den:0, duration:40830000
av_read_frame(pFormatCtx, Packet);
avcodec_decode_audio4( pAudioCodecCtx, pAudioFrame,&GotAudioPicture, Packet);
printf("Auduo index:%5d\t pts:%ld\t pts:%ld\t packet size:%d, pFrame->nb_samples:%d\n",
audioCnt, Packet->dts, Packet->pts, Packet->size, pAudioFrame->nb_samples);
//Auduo index: 0 pts:0 pts:0 packet size:847, pFrame->nb_samples:1024
//Auduo index: 1 pts:1024 pts:1024 packet size:846, pFrame->nb_samples:1024
//Auduo index: 2 pts:2048 pts:2048 packet size:846, pFrame->nb_samples:1024
//Auduo index: 3 pts:3072 pts:3072 packet size:847, pFrame->nb_samples:1024
//Auduo index: 4 pts:4096 pts:4096 packet size:846, pFrame->nb_samples:1024
//Auduo index: 5 pts:5120 pts:5120 packet size:846, pFrame->nb_samples:1024
- 時間單位:time_base是一個
AVRational
類型的變量,可以從輸出看出時間單位是1 / 48000,那麼用DTS×(1 / 48000)就是解碼時間戳,PTS×(1 / 48000)就是播放時間戳, - 因爲音頻沒有幀率的概念,因此avg_frame_rate的值都爲0
- duration表示因音頻流的時長,duration ×(1 / 48000)= 4083000 ×(1 / 48000)= 850.625S = 14分10秒,與獲取播放文件的信息中打印出的文件時長 Duration: 00:14:10.67 基本一致
- 通過av_read_frame函數讀取一幀音頻可以輸出相關信息
視頻DTS與PTS
視頻中由於B幀需要雙向預測,B幀依賴於其前和其後的幀,因此含B幀的視頻解碼順序與顯示順序不同,即DTS與PTS不同;不含B幀的視頻,其DTS和PTS是相同的。
printf("stream video time_base.num:%d, time_base.den:%d, avg_frame_rate.num:%d, avg_frame_rate.den:%d, duration:%ld\n",
pFormatCtx->streams[VideoIndex]->time_base.num,
pFormatCtx->streams[VideoIndex]->time_base.den,
pFormatCtx->streams[VideoIndex]->avg_frame_rate.num,
pFormatCtx->streams[VideoIndex]->avg_frame_rate.den,
pFormatCtx->streams[VideoIndex]->duration);
//輸出:stream video time_base.num:1, time_base.den:16, avg_frame_rate.num:8, avg_frame_rate.den:1, duration:13610
av_read_frame(pFormatCtx, Packet);
printf("Video index:%5d\t dts:%ld\t, pts:%ld\t packet size:%d\n",
videoCnt, Packet->dts, Packet->pts, Packet->size);
//Video index: 0 dts:-2 , pts:0 packet size:91041
//Video index: 1 dts:0 , pts:8 packet size:191
//Video index: 2 dts:2 , pts:2 packet size:103
//Video index: 3 dts:4 , pts:4 packet size:103
//Video index: 4 dts:6 , pts:6 packet size:103
- 時間單位:time_base是一個
AVRational
類型的變量,可以從輸出看出時間單位是1 / 16,那麼用DTS×(1 / 16)就是解碼時間戳,PTS×(1 / 16)就是播放時間戳, - 視頻的平均幀率爲:avg_frame_rate.num / avg_frame_rate.den = (8 / 1) = 8fps,在播放時每秒播放8幀,即125ms播放一幀
- duration表示因視頻流的時長,duration ×(1 / 16)= 13610 ×(1 / 16)= 822.5S = 13分42.5秒,與獲取播放文件的信息中打印出的文件時長 Duration: 00:14:10.67 誤差較大,
同步的實現
以上部分把同步播放需要的信息,全都得到了,那麼怎麼實現音視頻播放同步呢?很自然的我們需要多線程,不可能在一個線程裏完成這些事情
- 主線程:負責讀取多媒體文件信息,準備編解碼器上下文,在主循環中讀取文件的音視頻流,分別保存到音視頻的隊列,等待解碼
- video線程:從視頻對列中按照DTS的順序解碼一個視頻幀,進行重採樣,並根據視頻播放信號,將解碼後的視頻,幀使用SDL渲染到屏幕
- Audio線程:從音頻隊列按照DTS順序解碼音頻幀,進行重採樣之後,使用回調函數的方式,儘快的流暢播放重採樣後的音頻數據
- 視頻播放信號產生線程,此線程根據獲取的視頻流信息,主要是幀率信息,根據幀率信息換算出每一幀佔用的時間,按照這個時間間隔定時向 “video線程” 發送視頻播放信號
- SDL事件監聽線程:主要監控暫停、退出,以及自定義的信號,完成退出、暫停等SDL的GUI界面操作,簡單實現了暫停,恢復、退出等操作
通過以上的介紹,可以看出,我並沒有刻意的使用將視頻同步到音頻,而是各自按照自己的速度去播放,貌似也還可以。下面就把代碼貼上吧。
/*
* ffmpeg_sdl2_avpalyer.cpp
*
* Created on: 2019年4月4日
* Author: luke
* 實現音視頻播放同步
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define __STDC_CONSTANT_MACROS
#ifdef __cplusplus
extern "C"
{
#endif
#include <libavutil/time.h>
#include <libavutil/imgutils.h>
#include <libavutil/mathematics.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavdevice/avdevice.h>
#include <libswscale/swscale.h>
#include <libswresample/swresample.h>
#include <SDL2/SDL.h>
#include <errno.h>
#include <unistd.h>
#include <assert.h>
#include <pthread.h>
#include <semaphore.h>
#ifdef __cplusplus
};
#endif
#define MAX_AUDIO_FRAME_SIZE 192000 // 1 second of 48khz 32bit audio
#define PACKET_ARRAY_SIZE (60)
typedef struct __PacketStruct
{
AVPacket Packet;
int64_t dts;
int64_t pts;
int state;
}PacketStruct;
typedef struct
{
unsigned int rIndex;
unsigned int wIndex;
PacketStruct PacketArray[PACKET_ARRAY_SIZE];
}PacketArrayStruct;
typedef struct __AudioCtrlStruct
{
AVFormatContext *pFormatCtx;
AVStream *pStream;
AVCodec *pCodec;
AVCodecContext *pCodecCtx;
SwrContext *pConvertCtx;
Uint8 *audio_chunk;
Sint32 audio_len;
Uint8 *audio_pos;
int AudioIndex;
int AudioCnt;
uint64_t AudioOutChannelLayout;
int out_nb_samples; //nb_samples: AAC-1024 MP3-1152
AVSampleFormat out_sample_fmt;
int out_sample_rate;
int out_channels;
int out_buffer_size;
unsigned char* pAudioOutBuffer;
sem_t frame_put;
sem_t frame_get;
PacketArrayStruct Audio;
}AudioCtrlStruct;
typedef struct __VideoCtrlStruct
{
AVFormatContext *pFormatCtx;
AVStream *pStream;
AVCodec *pCodec;
AVCodecContext *pCodecCtx;
SwsContext *pConvertCtx;
AVFrame *pVideoFrame, *pFrameYUV;
unsigned char *pVideoOutBuffer;
int VideoIndex;
int VideoCnt;
int RefreshTime;
int screen_w,screen_h;
SDL_Window *screen;
SDL_Renderer* sdlRenderer;
SDL_Texture* sdlTexture;
SDL_Rect sdlRect;
SDL_Thread *video_tid;
sem_t frame_put;
sem_t video_refresh;
PacketArrayStruct Video;
}VideoCtrlStruct;
//Refresh Event
#define SFM_REFRESH_VIDEO_EVENT (SDL_USEREVENT + 1)
#define SFM_REFRESH_AUDIO_EVENT (SDL_USEREVENT + 2)
#define SFM_BREAK_EVENT (SDL_USEREVENT + 3)
int thread_exit = 0;
int thread_pause = 0;
VideoCtrlStruct VideoCtrl;
AudioCtrlStruct AudioCtrl;
//video time_base.num:1, time_base.den:16, avg_frame_rate.num:8, avg_frame_rate.den:1
//audio time_base.num:1, time_base.den:48000, avg_frame_rate.num:0, avg_frame_rate.den:0
int IsPacketArrayFull(PacketArrayStruct* p)
{
int i = 0;
i = p->wIndex % PACKET_ARRAY_SIZE;
if(p->PacketArray[i].state != 0) return 1;
return 0;
}
int IsPacketArrayEmpty(PacketArrayStruct* p)
{
int i = 0;
i = p->rIndex % PACKET_ARRAY_SIZE;
if(p->PacketArray[i].state == 0) return 1;
return 0;
}
int SDL_event_thread(void *opaque)
{
SDL_Event event;
while(1)
{
SDL_WaitEvent(&event);
if(event.type == SDL_KEYDOWN)
{
//Pause
if(event.key.keysym.sym == SDLK_SPACE)
{
thread_pause = !thread_pause;
printf("video got pause event!\n");
}
}
else if(event.type == SDL_QUIT)
{
thread_exit = 1;
printf("------------------------------>video got SDL_QUIT event!\n");
break;
}
else if(event.type == SFM_BREAK_EVENT)
{
break;
}
}
printf("---------> SDL_event_thread end !!!! \n");
return 0;
}
int video_refresh_thread(void *opaque)
{
while (1)
{
if(thread_exit) break;
if(thread_pause)
{
SDL_Delay(40);
continue;
}
usleep(VideoCtrl.RefreshTime);
sem_post(&VideoCtrl.video_refresh);
}
printf("---------> video_refresh_thread end !!!! \n");
return 0;
}
static void *thread_audio(void *arg)
{
AVCodecContext *pAudioCodecCtx;
AVFrame *pAudioFrame;
unsigned char *pAudioOutBuffer;
AVPacket *Packet;
int i, ret, GotAudioPicture;
struct SwrContext *AudioConvertCtx;
AudioCtrlStruct* AudioCtrl = (AudioCtrlStruct*)arg;
pAudioCodecCtx = AudioCtrl->pCodecCtx;
pAudioOutBuffer = AudioCtrl->pAudioOutBuffer;
AudioConvertCtx = AudioCtrl->pConvertCtx;
printf("---------> thread_audio start !!!! \n");
pAudioFrame = av_frame_alloc();
while(1)
{
if(thread_exit) break;
if(thread_pause)
{
usleep(10000);
continue;
}
//sem_wait(&AudioCtrl->frame_put);
if(IsPacketArrayEmpty(&AudioCtrl->Audio))
{
SDL_Delay(1);
printf("---------> thread_audio empty !!!! \n");
continue;
}
i = AudioCtrl->Audio.rIndex;
Packet = &AudioCtrl->Audio.PacketArray[i].Packet;
if(Packet->stream_index == AudioCtrl->AudioIndex)
{
ret = avcodec_decode_audio4( pAudioCodecCtx, pAudioFrame, &GotAudioPicture, Packet);
if ( ret < 0 )
{
printf("Error in decoding audio frame.\n");
return 0;
}
if ( GotAudioPicture > 0 )
{
swr_convert(AudioConvertCtx,&pAudioOutBuffer, MAX_AUDIO_FRAME_SIZE,
(const uint8_t **)pAudioFrame->data , pAudioFrame->nb_samples);
//printf("Auduo index:%5d\t pts:%ld\t packet size:%d, pFrame->nb_samples:%d\n",
// AudioCtrl->AudioCnt, Packet->pts, Packet->size, pAudioFrame->nb_samples);
AudioCtrl->AudioCnt++;
}
while(AudioCtrl->audio_len > 0)//Wait until finish
SDL_Delay(1);
//Set audio buffer (PCM data)
AudioCtrl->audio_chunk = (Uint8 *) pAudioOutBuffer;
AudioCtrl->audio_pos = AudioCtrl->audio_chunk;
AudioCtrl->audio_len = AudioCtrl->out_buffer_size;
//sem_post(&AudioCtrl->frame_get);
av_packet_unref(Packet);
AudioCtrl->Audio.PacketArray[i].state = 0;
i++;
if(i >= PACKET_ARRAY_SIZE) i = 0;
AudioCtrl->Audio.rIndex = i;
}
}
printf("---------> thread_audio end !!!! \n");
return 0;
}
static void *thread_video(void *arg)
{
AVCodecContext *pVideoCodecCtx;
AVFrame *pVideoFrame,*pFrameYUV;
AVPacket *Packet;
int i, ret, GotPicture;
struct SwsContext *VideoConvertCtx;
VideoCtrlStruct* VideoCtrl = (VideoCtrlStruct*)arg;
pVideoCodecCtx = VideoCtrl->pCodecCtx;
VideoConvertCtx = VideoCtrl->pConvertCtx;
pVideoFrame = VideoCtrl->pVideoFrame;
pFrameYUV = VideoCtrl->pFrameYUV;
printf("---------> thread_video start !!!! \n");
while(1)
{
if(thread_exit) break;
//sem_wait(&VideoCtrl->frame_put);
if(IsPacketArrayEmpty(&VideoCtrl->Video))
{
SDL_Delay(1);
continue;
}
i = VideoCtrl->Video.rIndex;
Packet = &VideoCtrl->Video.PacketArray[i].Packet;
if(Packet->stream_index == VideoCtrl->VideoIndex)
{
ret = avcodec_decode_video2(pVideoCodecCtx, pVideoFrame, &GotPicture, Packet);
if(ret < 0)
{
printf("Video Decode Error.\n");
return 0;
}
//printf("Video index:%5d\t dts:%ld\t, pts:%ld\t packet size:%d, GotVideoPicture:%d\n",
// VideoCtrl->VideoCnt, Packet->dts, Packet->pts, Packet->size, GotPicture);
// printf("Video index:%5d\t pFrame->pkt_dts:%ld, pFrame->pkt_pts:%ld, pFrame->pts:%ld, pFrame->pict_type:%d, "
// "pFrame->best_effort_timestamp:%ld, pFrame->pkt_pos:%ld, pVideoFrame->pkt_duration:%ld\n",
// VideoCtrl->VideoCnt, pVideoFrame->pkt_dts, pVideoFrame->pkt_pts, pVideoFrame->pts,
// pVideoFrame->pict_type, pVideoFrame->best_effort_timestamp,
// pVideoFrame->pkt_pos, pVideoFrame->pkt_duration);
VideoCtrl->VideoCnt++;
if(GotPicture)
{
sws_scale(VideoConvertCtx, (const unsigned char* const*)pVideoFrame->data,
pVideoFrame->linesize, 0, pVideoCodecCtx->height, pFrameYUV->data, pFrameYUV->linesize);
sem_wait(&VideoCtrl->video_refresh);
//SDL---------------------------
SDL_UpdateTexture( VideoCtrl->sdlTexture, NULL, pFrameYUV->data[0], pFrameYUV->linesize[0] );
SDL_RenderClear( VideoCtrl->sdlRenderer );
//SDL_RenderCopy( sdlRenderer, sdlTexture, &sdlRect, &sdlRect );
SDL_RenderCopy( VideoCtrl->sdlRenderer, VideoCtrl->sdlTexture, NULL, NULL);
SDL_RenderPresent( VideoCtrl->sdlRenderer );
//SDL End-----------------------
}
av_packet_unref(Packet);
VideoCtrl->Video.PacketArray[i].state = 0;
i++;
if(i >= PACKET_ARRAY_SIZE) i = 0;
VideoCtrl->Video.rIndex = i;
}
}
printf("---------> thread_video end !!!! \n");
return 0;
}
/* The audio function callback takes the following parameters:
* stream: A pointer to the audio buffer to be filled
* len: The length (in bytes) of the audio buffer
*/
void fill_audio(void *udata,Uint8 *stream,int len)
{
AudioCtrlStruct* AudioCtrl = (AudioCtrlStruct*)udata;
//SDL 2.0
SDL_memset(stream, 0, len);
if(AudioCtrl->audio_len == 0) return;
len=(len > AudioCtrl->audio_len ? AudioCtrl->audio_len : len); /* Mix as much data as possible */
SDL_MixAudio(stream, AudioCtrl->audio_pos, len, SDL_MIX_MAXVOLUME);
AudioCtrl->audio_pos += len;
AudioCtrl->audio_len -= len;
}
int main(int argc, char* argv[])
{
AVFormatContext *pFormatCtx;
AVCodecContext *pVideoCodecCtx, *pAudioCodecCtx;
AVCodec *pVideoCodec, *pAudioCodec;
AVPacket *Packet;
unsigned char *pVideoOutBuffer, *pAudioOutBuffer;
int ret;
unsigned int i;
pthread_t audio_tid, video_tid;
uint64_t AudioOutChannelLayout;
int out_nb_samples; //nb_samples: AAC-1024 MP3-1152
AVSampleFormat out_sample_fmt;
int out_sample_rate;
int out_channels;
int out_buffer_size;
struct SwsContext *VideoConvertCtx;
struct SwrContext *AudioConvertCtx;
int VideoIndex, VideoCnt;
int AudioIndex, AudioCnt;
memset(&AudioCtrl, 0, sizeof(AudioCtrlStruct));
memset(&VideoCtrl, 0, sizeof(VideoCtrlStruct));
char *filepath = argv[1];
sem_init(&VideoCtrl.video_refresh, 0, 0);
sem_init(&VideoCtrl.frame_put, 0, 0);
sem_init(&AudioCtrl.frame_put, 0, 0);
thread_exit = 0;
thread_pause = 0;
av_register_all();
avformat_network_init();
pFormatCtx = avformat_alloc_context();
if(avformat_open_input(&pFormatCtx, filepath, NULL, NULL) !=0 )
{
printf("Couldn't open input stream.\n");
return -1;
}
if(avformat_find_stream_info(pFormatCtx,NULL) < 0)
{
printf("Couldn't find stream information.\n");
return -1;
}
VideoIndex = -1;
AudioIndex = -1;
for(i = 0; i < pFormatCtx->nb_streams; i++)
{
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO)
{
VideoIndex = i;
//打印輸出視頻流的信息
printf("video time_base.num:%d, time_base.den:%d, avg_frame_rate.num:%d, avg_frame_rate.den:%d\n",
pFormatCtx->streams[VideoIndex]->time_base.num,
pFormatCtx->streams[VideoIndex]->time_base.den,
pFormatCtx->streams[VideoIndex]->avg_frame_rate.num,
pFormatCtx->streams[VideoIndex]->avg_frame_rate.den);
}
if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO)
{
AudioIndex = i;
//打印輸出音頻流的信息
printf("audio time_base.num:%d, time_base.den:%d, avg_frame_rate.num:%d, avg_frame_rate.den:%d\n",
pFormatCtx->streams[AudioIndex]->time_base.num,
pFormatCtx->streams[AudioIndex]->time_base.den,
pFormatCtx->streams[AudioIndex]->avg_frame_rate.num,
pFormatCtx->streams[AudioIndex]->avg_frame_rate.den);
}
}
if(VideoIndex != -1)
{ //準備視頻的解碼操作上下文數據結構,
pVideoCodecCtx = pFormatCtx->streams[VideoIndex]->codec;
pVideoCodec = avcodec_find_decoder(pVideoCodecCtx->codec_id);
if(pVideoCodec == NULL)
{
printf("Video Codec not found.\n");
return -1;
}
if(avcodec_open2(pVideoCodecCtx, pVideoCodec,NULL) < 0)
{
printf("Could not open video codec.\n");
return -1;
}
// prepare video
VideoCtrl.pVideoFrame = av_frame_alloc();
VideoCtrl.pFrameYUV = av_frame_alloc();
ret = av_image_get_buffer_size(AV_PIX_FMT_YUV420P, pVideoCodecCtx->width, pVideoCodecCtx->height, 1);
pVideoOutBuffer = (unsigned char *)av_malloc(ret);
av_image_fill_arrays(VideoCtrl.pFrameYUV->data, VideoCtrl.pFrameYUV->linesize, pVideoOutBuffer,
AV_PIX_FMT_YUV420P, pVideoCodecCtx->width, pVideoCodecCtx->height, 1);
VideoConvertCtx = sws_getContext(pVideoCodecCtx->width, pVideoCodecCtx->height, pVideoCodecCtx->pix_fmt,
pVideoCodecCtx->width, pVideoCodecCtx->height,
AV_PIX_FMT_YUV420P, SWS_BICUBIC, NULL, NULL, NULL);
VideoCtrl.pFormatCtx = pFormatCtx;
VideoCtrl.pStream = pFormatCtx->streams[VideoIndex];
VideoCtrl.pCodec = pVideoCodec;
VideoCtrl.pCodecCtx = pFormatCtx->streams[VideoIndex]->codec;
VideoCtrl.pConvertCtx = VideoConvertCtx;
VideoCtrl.pVideoOutBuffer = pVideoOutBuffer;
VideoCtrl.VideoIndex = VideoIndex;
if(pFormatCtx->streams[VideoIndex]->avg_frame_rate.num == 0 ||
pFormatCtx->streams[VideoIndex]->avg_frame_rate.den == 0)
{
VideoCtrl.RefreshTime = 40000;
}
else
{ //計算視頻每一幀的時間,使用此時間間隔在發送視頻播放信號
VideoCtrl.RefreshTime = 1000000 * pFormatCtx->streams[VideoIndex]->avg_frame_rate.den;
VideoCtrl.RefreshTime /= pFormatCtx->streams[VideoIndex]->avg_frame_rate.num;
}
printf("VideoCtrl.RefreshTime:%d\n", VideoCtrl.RefreshTime);
}
else
{
printf("Didn't find a video stream.\n");
}
if(AudioIndex != -1)
{ //準備音頻的解碼操作上下文數據結構,
pAudioCodecCtx = pFormatCtx->streams[AudioIndex]->codec;
pAudioCodec = avcodec_find_decoder(pAudioCodecCtx->codec_id);
if(pAudioCodec == NULL)
{
printf("Audio Codec not found.\n");
return -1;
}
if(avcodec_open2(pAudioCodecCtx, pAudioCodec,NULL) < 0)
{
printf("Could not open audio codec.\n");
return -1;
}
// prepare Out Audio Param
AudioOutChannelLayout = AV_CH_LAYOUT_STEREO;
out_nb_samples = pAudioCodecCtx->frame_size; //nb_samples: AAC-1024 MP3-1152
out_sample_fmt = AV_SAMPLE_FMT_S16;
out_sample_rate = pAudioCodecCtx->sample_rate;
// 此處一定使用pAudioCodecCtx->sample_rate這個變量賦值,否則使用不一樣的值會造成音頻少採樣或者過採樣,導致音頻播放出現雜音
out_channels = av_get_channel_layout_nb_channels(AudioOutChannelLayout);
out_buffer_size = av_samples_get_buffer_size(NULL,out_channels ,out_nb_samples,out_sample_fmt, 1);
//mp3:out_nb_samples:1152, out_channels:2, out_buffer_size:4608, pCodecCtx->channels:2
//aac:out_nb_samples:1024, out_channels:2, out_buffer_size:4096, pCodecCtx->channels:2
printf("out_nb_samples:%d, out_channels:%d, out_buffer_size:%d, pCodecCtx->channels:%d\n",
out_nb_samples, out_channels, out_buffer_size, pAudioCodecCtx->channels);
pAudioOutBuffer = (uint8_t *)av_malloc(MAX_AUDIO_FRAME_SIZE*2);
//FIX:Some Codec's Context Information is missing
int64_t in_channel_layout = av_get_default_channel_layout(pAudioCodecCtx->channels);
//Swr
AudioConvertCtx = swr_alloc();
AudioConvertCtx = swr_alloc_set_opts(AudioConvertCtx, AudioOutChannelLayout,
out_sample_fmt, out_sample_rate,
in_channel_layout, pAudioCodecCtx->sample_fmt ,
pAudioCodecCtx->sample_rate, 0, NULL);
swr_init(AudioConvertCtx);
AudioCtrl.pFormatCtx = pFormatCtx;
AudioCtrl.pStream = pFormatCtx->streams[AudioIndex];
AudioCtrl.pCodec = pAudioCodec;
AudioCtrl.pCodecCtx = pFormatCtx->streams[AudioIndex]->codec;
AudioCtrl.pConvertCtx = AudioConvertCtx;
AudioCtrl.AudioOutChannelLayout = AudioOutChannelLayout;
AudioCtrl.out_nb_samples = out_nb_samples;
AudioCtrl.out_sample_fmt = out_sample_fmt;
AudioCtrl.out_sample_rate = out_sample_rate;
AudioCtrl.out_channels = out_channels;
AudioCtrl.out_buffer_size = out_buffer_size;
AudioCtrl.pAudioOutBuffer = pAudioOutBuffer;
AudioCtrl.AudioIndex = AudioIndex;
}
else
{
printf("Didn't find a audio stream.\n");
}
//Output Info-----------------------------
printf("---------------- File Information ---------------\n");
av_dump_format(pFormatCtx, 0, filepath, 0);
printf("-------------- File Information end -------------\n");
if(SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER))
{
printf( "Could not initialize SDL - %s\n", SDL_GetError());
return -1;
}
if(VideoIndex != -1)
{
//SDL 2.0 Support for multiple windows
//SDL_VideoSpec
VideoCtrl.screen_w = pVideoCodecCtx->width;
VideoCtrl.screen_h = pVideoCodecCtx->height;
VideoCtrl.screen = SDL_CreateWindow("Simplest ffmpeg player's Window", SDL_WINDOWPOS_UNDEFINED,
SDL_WINDOWPOS_UNDEFINED, VideoCtrl.screen_w, VideoCtrl.screen_h, SDL_WINDOW_OPENGL);
if(!VideoCtrl.screen)
{
printf("SDL: could not create window - exiting:%s\n",SDL_GetError());
return -1;
}
VideoCtrl.sdlRenderer = SDL_CreateRenderer(VideoCtrl.screen, -1, 0);
//IYUV: Y + U + V (3 planes)
//YV12: Y + V + U (3 planes)
VideoCtrl.sdlTexture = SDL_CreateTexture(VideoCtrl.sdlRenderer, SDL_PIXELFORMAT_IYUV, SDL_TEXTUREACCESS_STREAMING,
pVideoCodecCtx->width, pVideoCodecCtx->height);
VideoCtrl.sdlRect.x = 0;
VideoCtrl.sdlRect.y = 0;
VideoCtrl.sdlRect.w = VideoCtrl.screen_w;
VideoCtrl.sdlRect.h = VideoCtrl.screen_h;
VideoCtrl.video_tid = SDL_CreateThread(video_refresh_thread, NULL, NULL);
ret = pthread_create(&video_tid, NULL, thread_video, &VideoCtrl);
if (ret)
{
printf("create thr_rvs video thread failed, error = %d \n", ret);
return -1;
}
}
if(AudioIndex != -1)
{
//SDL_AudioSpec
SDL_AudioSpec AudioSpec;
AudioSpec.freq = out_sample_rate;
AudioSpec.format = AUDIO_S16SYS;
AudioSpec.channels = out_channels;
AudioSpec.silence = 0;
AudioSpec.samples = out_nb_samples;
AudioSpec.callback = fill_audio;
AudioSpec.userdata = (void*)&AudioCtrl;
if (SDL_OpenAudio(&AudioSpec, NULL) < 0)
{
printf("can't open audio.\n");
return -1;
}
ret = pthread_create(&audio_tid, NULL, thread_audio, &AudioCtrl);
if (ret)
{
printf("create thr_rvs video thread failed, error = %d \n", ret);
return -1;
}
SDL_PauseAudio(0);
}
SDL_Thread *event_tid;
event_tid = SDL_CreateThread(SDL_event_thread, NULL, NULL);
VideoCnt = 0;
AudioCnt = 0;
Packet = (AVPacket *)av_malloc(sizeof(AVPacket));
av_init_packet(Packet);
while(1)
{
if(thread_exit) break;
if(av_read_frame(pFormatCtx, Packet) < 0)
{ //讀取的到文件結束,自動退出,想SDL事件監聽線程發送退出信號
thread_exit = 1;
SDL_Event event;
event.type = SFM_BREAK_EVENT;
SDL_PushEvent(&event);
printf("---------> av_read_frame < 0, thread_exit = 1 !!!\n");
break;
}
if(Packet->stream_index == VideoIndex)
{
if(VideoCtrl.Video.wIndex >= PACKET_ARRAY_SIZE)
{
VideoCtrl.Video.wIndex = 0;
}
while(IsPacketArrayFull(&VideoCtrl.Video))
{
usleep(5000);
//printf("---------> VideoCtrl.Video.PacketArray FULL !!!\n");
}
i = VideoCtrl.Video.wIndex;
VideoCtrl.Video.PacketArray[i].Packet = *Packet;
VideoCtrl.Video.PacketArray[i].dts = Packet->dts;
VideoCtrl.Video.PacketArray[i].pts = Packet->pts;
VideoCtrl.Video.PacketArray[i].state = 1;
VideoCtrl.Video.wIndex++;
//printf("VideoCtrl.frame_put, VideoCnt:%d\n", VideoCnt++);
//sem_post(&VideoCtrl.frame_put);
}
if(Packet->stream_index == AudioIndex)
{
if(AudioCtrl.Audio.wIndex >= PACKET_ARRAY_SIZE)
{
AudioCtrl.Audio.wIndex = 0;
}
while(IsPacketArrayFull(&AudioCtrl.Audio))
{
usleep(5000);
//printf("---------> AudioCtrl.Audio.PacketArray FULL !!!\n");
}
i = AudioCtrl.Audio.wIndex;
AudioCtrl.Audio.PacketArray[i].Packet = *Packet;
AudioCtrl.Audio.PacketArray[i].dts = Packet->dts;
AudioCtrl.Audio.PacketArray[i].pts = Packet->pts;
AudioCtrl.Audio.PacketArray[i].state = 1;
AudioCtrl.Audio.wIndex++;
//printf("AudioCtrl.frame_put, AudioCnt:%d\n", AudioCnt++);
//sem_post(&AudioCtrl.frame_put);
}
}
SDL_WaitThread(event_tid, NULL);
//printf("--------------------------->main exit 0 !!\n");
SDL_WaitThread(VideoCtrl.video_tid, NULL);
//printf("--------------------------->main exit 1 !!\n");
pthread_join(audio_tid, NULL);
//printf("--------------------------->main exit 2 !!\n");
pthread_join(video_tid, NULL);
//printf("--------------------------->main exit 3 !!\n");
SDL_CloseAudio();//Close SDL
//printf("--------------------------->main exit 4 !!\n");
SDL_Quit();
//printf("--------------------------->main exit 5 !!\n");
swr_free(&AudioConvertCtx);
sws_freeContext(VideoConvertCtx);
//printf("--------------------------->main exit 6 !!\n");
av_free(pVideoOutBuffer);
avcodec_close(pVideoCodecCtx);
//printf("--------------------------->main exit 7 !!\n");
av_free(pAudioOutBuffer);
avcodec_close(pAudioCodecCtx);
avformat_close_input(&pFormatCtx);
printf("--------------------------->main exit 8 !!\n");
}