ffmpeg的tutorial中文版學習筆記(一)

在網上下載了一些pdf版的ffmpeg的tutorial中文版，在學習過程中發現有很多錯誤，這些錯誤，或者是文章中的代碼中的變量作者沒有定義，或者是由於ffmpeg一直在更新，”以時俱進“，而這些資料早已年久失修，一些函數早已更名，或被別的函數替代，學習過程中發現很多問題，故決定做下筆記，做出總結：

FFMPEG 是一個很好的庫，可以用來創建視頻應用或者生成特定的工具。FFMPEG幾乎爲你把所有的繁重工作都做了，比如解碼、編碼、複用和解複用。這使得多媒體應用程序變得容易編寫。它是一個簡單的，用C 編寫的，快速的並且能夠解碼幾乎所有你能用到的格式，當然也包括編碼多種格式。

唯一的問題是它的文檔基本上是沒有的。有一個單獨的指導講了它的基本原理另外還有一個使用doxygen 生成的文檔。這就是爲什麼當我決定研究 FFMPEG 來弄清楚音視頻應用程序是如何工作的過程中，我決定把這個過程用文檔的形式記錄並且發佈出來作爲初學指導的原因。

在FFMPEG 工程中有一個示例的程序叫作ffplay。它是一個用C 編寫的利用ffmpeg 來實現完整視頻播放的簡單播放器。這個指導將從原來Martin Bohme 寫的一個更新版本的指導開始（我借鑑了一些），基於Fabrice Bellard 的ffplay，我將從那裏開發一個可以使用的視頻播放器。在每一個指導中，我將介紹一個或者兩個新的思想並且講解我們如何來實現它。每一個指導都會有一個C 源文件，你可以下載，編譯並沿着這條思路來自己做。源文件將向你展示一個真正的程序是如何運行，我們如何來調用所有的部件，也將告訴你在這個指導中技術實

現的細節並不重要。當我們結束這個指導的時候，我們將有一個少於1000 行代碼的可以工作的視頻播放器。

在寫播放器的過程中，我們將使用SDL 來輸出音頻和視頻。SDL 是一個優秀的跨平臺的多媒體庫，被用在MPEG 播放、模擬器和很多視頻遊戲中。你將需要下載並安裝SDL 開發庫到你的系統中，以便於編譯這個指導中的程序。

這篇指導適用於具有相當編程背景的人。至少至少應該懂得C 並且有隊列和互斥量等概念。你應當瞭解基本的多媒體中的像波形一類的概念，但是你不必知道的太多，因爲我將在這篇指導中介紹很多這樣的概念。

更新：我修正了在指導7 和8 中的一些代碼錯誤，也添加-lavutil 參數。

指導1：製作屏幕錄像

源代碼：tutorial01.c

概要

電影文件有很多基本的組成部分。首先，文件本身被稱爲容器Container，容器的類型決定了信息被存放在文件中的位置。AVI 和 Quicktime 就是容器的例子。接着，你有一組流，例如，你經常有的是一個音頻流和一個視頻流。（一個流只是一種想像出來的詞語，用來表示一連串的通過時間來串連的數據元素）。在流中的數據元素被稱爲幀Frame。每個流是由不同的編碼器來編碼生成的。編解碼器描述了實際的數據是如何被編碼Coded 和解碼DECoded 的，因此它的名字叫做CODEC。Divx 和 MP3 就是編解碼器的例子。接着從流中被讀出來的叫做包 Packets。包是一段數據，它包含了一段可以被解碼成方便我們最後在應用程序中操作的原始幀的數據。根據我們的目的，每個包包含了完整的幀或者對於音頻來說是許多格式的完整幀。

基本上來說，處理視頻和音頻流是很容易的：

10 從video.avi 文件中打開視頻流video_stream

20 從視頻流中讀取包到幀中

30 如果這個幀還不完整，跳到20

40 對這個幀進行一些操作

50 跳回到20

雖然很多程序可能在對幀進行操作的時候非常的複雜，但是在這個程序中若使用ffmpeg 來處理多種媒體是相當容易的。因此在這篇指導中，我們將打開一個文件，讀取裏面的視頻流，而且我們對幀的操作將是把這個幀寫到一個PPM 文件中。

打開文件

首先來看一下我們如何打開一個文件。通過ffmpeg，你必需先初始化這個庫：（注意在某些系統中必需用<ffmpeg/avcodec.h>和<ffmpeg/avformat.h>來替換）

#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include <stdio.h>
void SaveFrame(AVFrame *pFrame,int width,int height,int iFrame);
int main(int argc,char * argv[])
{
    av_register_all();

這裏註冊了所有的文件格式和編解碼器的庫，所以它們將被自動的使用在被打開的合適格式的文件上。注意你只需要調用av_register_all()一次，因此我們在主函數main()中來調用它。如果你喜歡，也可以只註冊特定的格式和編解碼器，但是通常你沒有必要這樣做。現在我們可以真正的打開文件：

     AVFormatContext *pFormatCtx;
     pFormatCtx=avformat_alloc_context();
     //Open video file
     #ifdef _FFMPEG_0_6_
          if(av_open_input_file(&pFormatCtx,argv[1],NULL,0,NULL))
     #else
          if (avformat_open_input(&pFormatCtx,argv[1],NULL,NULL)!=0)
     #endif
               return -1;//Couldn't open file

我們通過第一個參數來獲得文件名。這個av_open_input_file函數讀取文件的頭部並且把信息保存到我們給的AVFormatContext 結構體中。最後三個參數用來指定特殊的文件格式，緩衝大小和格式參數，但如果把它們設置爲空NULL 或者0，libavformat 將自動檢測這些參數。(av_open_input_file函數現在已被avformat_open_input函數取代)

關於avformat_open_input：

/**
* Open an input stream and read the header. The codecs are not opened.
* The stream must be closed with avformat_close_input().
*
* @param ps Pointer to user-supplied AVFormatContext (allocated by avformat_alloc_context).
*           May be a pointer to NULL, in which case an AVFormatContext is allocated by this
*           function and written into ps.
*           Note that a user-supplied AVFormatContext will be freed on failure.
* @param filename Name of the stream to open.
* @param fmt If non-NULL, this parameter forces a specific input format.
*            Otherwise the format is autodetected.
* @param options  A dictionary filled with AVFormatContext and demuxer-private options.
*                 On return this parameter will be destroyed and replaced with a dict containing
*                 options that were not found. May be NULL.
*
* @return 0 on success, a negative AVERROR on failure.
*
* @note If you want to use custom IO, preallocate the format context and set its pb field.
*/
int avformat_open_input(AVFormatContext **ps, const char *filename, AVInputFormat *fmt, AVDictionary **options);

這個函數只是檢測了文件的頭部，所以接着我們需要檢查在文件中的流的信息：

     //Retrieve stream information
     #ifdef _FFMPEG_0_6_
          if (av_find_stream_info(pFormatCtx)<0)
     #else
          if(avformat_find_stream_info(pFormatCtx,NULL)<0)
     #endif
               return -1;//Couldn't find stream information

這個av_find_stream_info函數（已被avformat_find_stream_info函數取代）爲pFormatCtx->streams 填充上正確的信息。我們引進一個手工調試的函數來看一下里面有什麼：

 //Dump information about file onto standard error
  av_dump_format(pFormatCtx,0,argv[1],0);

現在pFormatCtx->streams 僅僅是一組大小爲pFormatCtx->nb_streams 的指針，所以讓我們先跳過它直到我們找到一個視頻流。

 int i;
	 AVCodecContext *pCodecCtx;
	 //Find the first video stream
	 int videoStream=-1;
	 printf("pFormatCtx->nb_streams=%d\n",pFormatCtx->nb_streams);
	 for (i = 0; i < pFormatCtx->nb_streams; ++i)
	 {
	      if (pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO)
	      {
	           videoStream=i;
	           break;
	      }
	 }
	 if (videoStream==-1)
	      return -1;//Didn't find a video stream
	 //Get a pointer to the codec context for the video stream
	 pCodecCtx=pFormatCtx->streams[videoStream]->codec;
	 printf("videoStream=%d\n", videoStream);

源代碼中的pFormatCtx->streams[i]->codec->codec_type類型爲:

enum AVMediaType {
	AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
	AVMEDIA_TYPE_VIDEO,
	AVMEDIA_TYPE_AUDIO,
	AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
	AVMEDIA_TYPE_SUBTITLE,
	AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
	AVMEDIA_TYPE_NB
};

流中關於編解碼器的信息就是被我們叫做"codec context"（編解碼器上下文）的東西。這裏麪包含了流中所使用的關於編解碼器的所有信息，現在我們有了一個指向它的指針。但是我們必需要找到真正的編解碼器並且打開它：

     AVCodec *pCodec;     //Find the decoder for the video stream     
     pCodec=avcodec_find_decoder(pCodecCtx->codec_id);     
     if(pCodec==NULL)
     {
          fprintf(stderr,"Unsupported codec!\n");
          return -1; // Codec not found
     }
     //Open codec
     #ifdef _FFMPEG_0_6_
          if(avcodec_open(pCodecCtx,pCodec)<0)
     #else
          if(avcodec_open2(pCodecCtx,pCodec,NULL)<0)
     #endif
          return -1;//Could not open codec

有些人可能會從舊的指導中記得有兩個關於這些代碼其它部分：添加CODEC_FLAG_TRUNCATED 到pCodecCtx->flags 和添加一個hack 來粗糙的修正幀率。這兩個修正已經不再存在於ffplay.c 中。因此我必須假設它們不再必要。我們移除了那些代碼後還有一個需要指出的不同點：pCodecCtx->time_base 現在已經保存了幀率的信息。time_base 是一個結構體，它裏面有一個分子和分母(AVRational)。我們使用分數的方式來表示幀率是因爲很多編解碼器使用非整數的幀率（例如NTSC 使用29.97fps）。

保存數據

現在我們需要找到一個地方來保存幀：

     AVFrame *pFrame;
     //Allocate video frame
     pFrame=avcodec_alloc_frame();

因爲我們準備輸出保存24 位RGB 色的PPM 文件，我們必需把幀的格式從原來的轉換爲RGB。FFMPEG 將爲我們做這些轉換。在大多數項目中（包括我們的這個）我們都想把原始的幀轉換成一個特定的格式。讓我們先爲轉換來申請一幀的內存。

     //Allocate an AVFrame structure
     AVFrame *pFrameRGB;
     pFrameRGB=avcodec_alloc_frame();
     if (pFrameRGB==NULL)
          return -1;

即使我們申請了一幀的內存，當轉換的時候，我們仍然需要一個地方來放置原始的數據。我們使用avpicture_get_size 來獲得我們需要的大小，然後手工申請內存空間：

     uint8_t *buffer;
     int numBytes;
     //Determine required buffer size and allocate buffer
     numBytes=avpicture_get_size(PIX_FMT_RGB24,pCodecCtx->width,pCodecCtx->height);
     buffer=(uint8_t *)av_malloc(numBytes*sizeof(uint8_t));

av_malloc 是ffmpeg 的malloc，用來實現一個簡單的malloc 的包裝，這樣來保證內存地址是對齊的（4 字節對齊或者2 字節對齊）。它並不能保證你不被內存泄漏，重複釋放或者其它malloc 的問題所困擾。現在我們使用avpicture_fill函數把幀和我們新申請到的內存結合。關於AVPicture 的構成：AVPicture 結構體是AVFrame 結構體的子集――AVFrame 結構體的開始部分與AVPicture 結構體是一樣的。

//Assign appropriate parts of buffer to image planes in pFrameRGB
//Note that pFrameRGB is an AVFrame,but AVFrame is a superset of AVPicture
avpicture_fill((AVPicture *)pFrameRGB,buffer,PIX_FMT_RGB24,pCodecCtx->width,pCodecCtx->height);

其中的avpicture_fill()函數含義:

/**
* Setup the picture fields based on the specified image parameters
* and the provided image data buffer.
*
* The picture fields are filled in by using the image data buffer
* pointed to by ptr.
*
* If ptr is NULL, the function will fill only the picture linesize
* array and return the required size for the image buffer.
*
* To allocate an image buffer and fill the picture data in one call,
* use avpicture_alloc().
*
* @param picture       the picture to be filled in
* @param ptr           buffer where the image data is stored, or NULL
* @param pix_fmt       the pixel format of the image
* @param width         the width of the image in pixels
* @param height        the height of the image in pixels
* @return the size in bytes required for src, a negative error code
* in case of failure
*
* @see av_image_fill_arrays()
*/
int avpicture_fill(AVPicture *picture, const uint8_t *ptr, enum AVPixelFormat pix_fmt, int width, int height);

最後，我們已經準備好來從流中讀取數據了。

讀取數據

我們將要做的是通過讀取包來讀取整個視頻流，然後把它解碼成幀，最後轉換格式並保存。

     int frameFinished;
     AVPacket packet;
     i=0;
     av_init_packet(&packet);//////////////
     while(av_read_frame(pFormatCtx,&packet)>=0)
     {
          //printf("packet.stream_index=%d, packet.size=%d\n", packet.stream_index,packet.size);
          //Is this a packet from the video stream?
          if (packet.stream_index==videoStream)
          {
               //Decode video frame
               //avcodec_decode_video(pCodecCtx,pFrame,&frameFinished,packet.data,packet.size);
               avcodec_decode_video2(pCodecCtx,pFrame,&frameFinished,&packet);
               printf("frameFinished=%d\n",frameFinished );
               //Did we get a video frame?
               if (frameFinished)
               {
                    //Convert the image from its native format to RGB
                    //img_convert((AVPicture*)pFrameRGB,PIX_FMT_RGB24,(AVPicture*)pFrame,pCodecCtx->pix_fmt,pCodecCtx->width,pCodecCtx->height);
                    static struct SwsContext *img_convert_ctx;
                    img_convert_ctx=sws_getContext(pCodecCtx->width,pCodecCtx->height,pCodecCtx->pix_fmt,pCodecCtx->width,pCodecCtx->height,
                                                   PIX_FMT_RGB24,SWS_BICUBIC,NULL,NULL,NULL);
                    if(img_convert_ctx==NULL)
                    {
                         fprintf(stderr,"Can not initialize the conversion context!\n");
                         exit(1);
                    }
                    sws_scale(img_convert_ctx,(const uint8_t *const)pFrame->data,pFrame->linesize,0,pCodecCtx->height,
                              pFrameRGB->data,pFrameRGB->linesize);
                    //Save the frame to disk
                    if (++i<=5)
                         SaveFrame(pFrameRGB,pCodecCtx->width,pCodecCtx->height,i);
               }
          }
          //Free the packet that was allocated by av_read_frame
          av_free_packet(&packet);
     }

這個循環過程是比較簡單的：av_read_frame()讀取一個包並且把它保存到AVPacket 結構體中。注意我們僅僅申請了一個包的結構體――ffmpeg 爲我們申請了內部的數據的內存並通過packet.data 指針來指向它。這些數據可以在後面通過av_free_packet()函數來釋放。函avcodec_decode_video()把包轉換爲幀。然而當解碼一個包的時候，我們可能沒有得到我們需要的關於一幀的完整信息。因此，當我們得到一個幀的時候，avcodec_decode_video()爲我們設置了幀結束標誌frameFinished。然後我們使用 img_convert()函數來把幀從原始格式（pCodecCtx->pix_fmt）轉換成爲RGB 格式(img_convert()函數已被sws_scale()函數取代)。要記住，你可以把一個 AVFrame結構體的指針轉換爲AVPicture 結構體的指針。最後我們把幀以及高度,寬度信息傳遞給我們的SaveFrame 函數。

SwsContext：視頻分辯率、色彩空間變換時所需要的上下文句柄。

關於sws_scale()函數：

/**
* Scale the image slice in srcSlice and put the resulting scaled
* slice in the image in dst. A slice is a sequence of consecutive
* rows in an image.
*
* Slices have to be provided in sequential order, either in
* top-bottom or bottom-top order. If slices are provided in
* non-sequential order the behavior of the function is undefined.
*
* @param c         the scaling context previously created with
*                  sws_getContext()
* @param srcSlice  the array containing the pointers to the planes of
*                  the source slice
* @param srcStride the array containing the strides for each plane of
*                  the source image
* @param srcSliceY the position in the source image of the slice to
*                  process, that is the number (counted starting from
*                  zero) in the image of the first row of the slice
* @param srcSliceH the height of the source slice, that is the number
*                  of rows in the slice
* @param dst       the array containing the pointers to the planes of
*                  the destination image
* @param dstStride the array containing the strides for each plane of
*                  the destination image
* @return          the height of the output slice
*/
int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[], const int srcStride[], int srcSliceY, int srcSliceH, uint8_t *const dst[], const int dstStride[]);

關於包Packets的註釋

從技術上講一個包可以包含部分或者其它的數據，但是 ffmpeg 的解釋器保證了我們得到的包Packets 包含的要麼是完整的要麼是多種完整的幀。

現在我們需要做的是讓SaveFrame 函數能把RGB 信息定稿到一個PPM 格式的文件中。我們將生成一個簡單的PPM 格式文件，請相信，它是可以工作的。

void SaveFrame(AVFrame *pFrame,int width,int height,int iFrame)
{
     FILE *pFile;
     char szFilename[32];
     int y;
     //Open file
     sprintf(szFilename,"frame%d.ppm",iFrame);
     pFile=fopen(szFilename,"wb");
     if(pFile==NULL)
          return;
     //Write header
     fprintf(pFile, "P6\n%d %d\n255\n",width,height );
     //Write pixel data
     for (y=0;y<height; ++y)
          fwrite(pFrame->data[0]+y*pFrame->linesize[0],1,width*3,pFile);
     //Close file
     fclose(pFile);
}

我們做了一些標準的文件打開動作，然後寫入RGB 數據。我們一次向文件寫入一行數據。PPM 格式文件的是一種包含一長串的RGB 數據的文件。如果你瞭解 HTML色彩表示的方式，那麼它就類似於把每個像素的顏色頭對頭的展開，就像#ff0000#ff0000....就表示了了個紅色的屏幕。（它被保存成二進制方式並且沒有分隔符，但是你自己是知道如何分隔的）。文件的頭部表示了圖像的寬度和高度以及最大的RGB 值的大小。

關於PPM文件：

ppm是一種簡單的圖像格式，僅包含格式、圖像寬高、bit數等信息和圖像數據。
圖像數據的保存格式可以用ASCII碼，也可用二進制，下面只說說一種ppm格式中比較簡單的一種：24位彩色、二進制保存的圖像。
文件頭+rgb數據:
P6\n
width height\n
255\n
rgbrgb...
其中P6表示ppm的這種格式；\n表示換行符；width和height表示圖像的寬高，用空格隔開；255表示每個顏色分量的最大值；rgb數據從上到下，從左到右排放

文件頭由3行文本組成，可由fgets讀出
    1）第一行爲“P6"，表示文件類型
    2）第二行爲圖像的寬度和高度
    3）第三行爲最大的象素值
    接下來是圖像數據塊。按行順序存儲。每個象素佔3個字節，依次爲紅綠藍通道，每個通道爲1字節整
    數。左上角爲座標原點。

現在，回顧我們的main()函數。一旦我們開始讀取完視頻流，我們必需清理一切：

     //Free the RGB image
     av_free(buffer);
     av_free(pFrameRGB);
     //Free the YUV frame
     av_free(pFrame);
     //Close the codec
     avcodec_close(pCodecCtx);
     //Close the video file
     #ifdef _FFMPEG_0_6_
          av_close_input_file(pFormatCtx);
     #else
          avformat_close_input(&pFormatCtx);
     #endif
     avformat_free_context(pFormatCtx);
     return 0;

你會注意到我們使用av_free 來釋放我們使用avcode_alloc_fram 和av_malloc來分配的內存。
在我Linux系統下編譯的命令：

gcc ./tutorial01.c -o ./tutorial01 -lavutil -lavformat -lavcodec -lswscale -lz -lm -I /home/Jiakun/ffmpeg_build/include -L /home/Jiakun/ffmpeg_build/lib/
源代碼：見這裏的github。

ffmpeg的tutorial中文版學習筆記(一)

結構的存儲分配，對齊

ffmpeg的tutorial中文版學習筆記(四)

ffmpeg的tutorial中文版學習筆記(五)

LeetCode || Permutation Sequence

LeetCode || 3Sum

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結