我們現在已經能在安卓上播放視頻畫面了,但是聲音部分還是缺失的,這篇博客就來把視頻的音頻播放模塊也加上。

爲了音頻和視頻可以分別解碼播放,我們需要對之前的代碼做重構,將媒體流的讀取和解碼解耦:

MediaReader從文件流中讀取出AVPacket交由VideoStreamDecoder和AudioStreamDecoder做視頻與音頻的解碼。我們在MediaReader里加上線程安全機制,使得視頻和音頻可以分別在各自的工作線程中進行解碼。

音頻分⽚(plane)與打包(packed)

解碼出來的AVFrame,它的data字段放的是視頻像素數據或者音頻的PCM裸流數據,linesize字段放的是對齊後的畫面行長度或者音頻的分片長度:

   /**
    * For video, size in bytes of each picture line.
    * For audio, size in bytes of each plane.
    *
    * For audio, only linesize[0] may be set. For planar audio, each channel
    * plane must be the same size.
    *
    * For video the linesizes should be multiples of the CPUs alignment
    * preference, this is 16 or 32 for modern desktop CPUs.
    * Some code requires such alignment other code can be slower without
    * correct alignment, for yet other it makes no difference.
    *
    * @note The linesize may be larger than the size of usable data -- there
    * may be extra padding present for performance reasons.
    */
    int linesize[AV_NUM_DATA_POINTERS];

視頻相關的在之前的博客中有介紹,音頻的話可以看到它只有linesize[0]會被設置,如果有多個分片,每個分片的size都是相等的。

要理解這裏的分片size,先要理解音頻數據的兩種存儲格式分⽚(plane)與打包(packed)。以常見的雙聲道音頻爲例子,

分⽚存儲的數據左聲道和右聲道分開存儲,左聲道存儲在data[0],右聲道存儲在data[1],他們的數據buffer的size都是linesize[0]。

打包存儲的數據按照LRLRLR...的形式交替存儲在data[0]中,這個數據buffer的size是linesize[0]。

AVSampleFormat枚舉音頻的格式,帶P後綴的格式是分配存儲的:

AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP,        ///< float, planar
AV_SAMPLE_FMT_DBLP,        ///< double, planar

不帶P後綴的格式是打包存儲的:

AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
AV_SAMPLE_FMT_S16,         ///< signed 16 bits
AV_SAMPLE_FMT_S32,         ///< signed 32 bits
AV_SAMPLE_FMT_FLT,         ///< float
AV_SAMPLE_FMT_DBL,         ///< double

音頻數據的實際長度

這裏有個坑點備註裏面也寫的很清楚了,linesize標明的大小可能會大於實際的音視頻數據大小,因爲可能會有額外的填充。

@note The linesize may be larger than the size of usable data -- there

may be extra padding present for performance reasons.

所以音頻數據實際的長度需要用音頻的參數計算出來:

int channelCount = audioStreamDecoder.GetChannelCount();
int bytePerSample = audioStreamDecoder.GetBytePerSample();
int size = frame->nb_samples * channelCount * bytePerSample;

音頻格式轉換

視頻之前的demo中已經可以使用OpenGL播放,而音頻可以交給OpenSL來播放,之前我寫過一篇《OpenSL ES 學習筆記》詳細的使用細節我就不展開介紹了,直接將代碼拷貝來使用。

但是由於OpenSLES只支持打包的幾種音頻格式:

#define SL_PCMSAMPLEFORMAT_FIXED_8  ((SLuint16) 0x0008)
#define SL_PCMSAMPLEFORMAT_FIXED_16 ((SLuint16) 0x0010)
#define SL_PCMSAMPLEFORMAT_FIXED_20     ((SLuint16) 0x0014)
#define SL_PCMSAMPLEFORMAT_FIXED_24 ((SLuint16) 0x0018)
#define SL_PCMSAMPLEFORMAT_FIXED_28     ((SLuint16) 0x001C)
#define SL_PCMSAMPLEFORMAT_FIXED_32 ((SLuint16) 0x0020)

這裏我們指的AudioStreamDecoder的目標格式爲AV_SAMPLE_FMT_S16,如果原始音頻格式不是它,則對音頻做轉碼:

audioStreamDecoder.Init(reader, audioIndex, AVSampleFormat::AV_SAMPLE_FMT_S16);


bool AudioStreamDecoder::Init(MediaReader *reader, int streamIndex, AVSampleFormat sampleFormat) {
    ...

    bool result = StreamDecoder::Init(reader, streamIndex);

    if (sampleFormat == AVSampleFormat::AV_SAMPLE_FMT_NONE) {
        mSampleFormat = mCodecContext->sample_fmt;
    } else {
        mSampleFormat = sampleFormat;
    }

    if (mSampleFormat != mCodecContext->sample_fmt) {
        mSwrContext = swr_alloc_set_opts(
                NULL,
                mCodecContext->channel_layout,
                mSampleFormat,
                mCodecContext->sample_rate,
                mCodecContext->channel_layout,
                mCodecContext->sample_fmt,
                mCodecContext->sample_rate,
                0,
                NULL);
        swr_init(mSwrContext);

        // 雖然前面的swr_alloc_set_opts已經設置了這幾個參數
        // 但是用於接收的AVFrame不設置這幾個參數也會接收不到數據
        // 原因是後面的swr_convert_frame函數會通過av_frame_get_buffer創建數據的buff
        // 而av_frame_get_buffer需要AVFrame設置好這些參數去計算buff的大小
        mSwrFrame = av_frame_alloc();
        mSwrFrame->channel_layout = mCodecContext->channel_layout;
        mSwrFrame->sample_rate = mCodecContext->sample_rate;
        mSwrFrame->format = mSampleFormat;
    }
    return result;
}

AVFrame *AudioStreamDecoder::NextFrame() {
    AVFrame *frame = StreamDecoder::NextFrame();
    if (NULL == frame) {
        return NULL;
    }
    if (NULL == mSwrContext) {
        return frame;
    }

    swr_convert_frame(mSwrContext, mSwrFrame, frame);
    return mSwrFrame;
}

這裏我們使用swr_convert_frame進行轉碼:

int swr_convert_frame(SwrContext *swr,     // 轉碼上下文
                      AVFrame *output,     // 轉碼後輸出到這個AVFrame
                      const AVFrame *input // 原始輸入AVFrame
);

這個方法要求輸入輸出的AVFrame都設置了channel_layout、 sample_rate、format參數,然後回調用av_frame_get_buffer爲output創建數據buff:

/**
 * ...
 *
 * Input and output AVFrames must have channel_layout, sample_rate and format set.
 *
 * If the output AVFrame does not have the data pointers allocated the nb_samples
 * field will be set using av_frame_get_buffer()
 * is called to allocate the frame.
 * ...
 */
int swr_convert_frame(SwrContext *swr,
                      AVFrame *output, const AVFrame *input);

SwrContext爲轉碼的上下文,通過swr_alloc_set_opts和swr_init創建,需要把轉碼前後的音頻channel_layout、 sample_rate、format信息傳入:

struct SwrContext *swr_alloc_set_opts(struct SwrContext *s,
                                      int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
                                      int64_t  in_ch_layout, enum AVSampleFormat  in_sample_fmt, int  in_sample_rate,
                                      int log_offset, void *log_ctx);

int swr_init(struct SwrContext *s);

視頻格式轉換

之前的demo裏面我們判斷了視頻格式不爲AV_PIX_FMT_YUV420P則直接報錯,這裏我們仿照音頻轉換的例子,判斷原始視頻格式不爲AV_PIX_FMT_YUV420P則使用sws_scale進行格式轉換:

bool VideoStreamDecoder::Init(MediaReader *reader, int streamIndex, AVPixelFormat pixelFormat) {
    ...
    bool result = StreamDecoder::Init(reader, streamIndex);
    if (AVPixelFormat::AV_PIX_FMT_NONE == pixelFormat) {
        mPixelFormat = mCodecContext->pix_fmt;
    } else {
        mPixelFormat = pixelFormat;
    }

    if (mPixelFormat != mCodecContext->pix_fmt) {
        int width = mCodecContext->width;
        int height = mCodecContext->height;

        mSwrFrame = av_frame_alloc();

        // 方式一,使用av_frame_get_buffer創建數據存儲空間,av_frame_free的時候會自動釋放
        mSwrFrame->width = width;
        mSwrFrame->height = height;
        mSwrFrame->format = mPixelFormat;
        av_frame_get_buffer(mSwrFrame, 0);

        // 方式二,使用av_image_fill_arrays指定存儲空間,需要我們手動調用av_malloc、av_free去創建、釋放空間
//        unsigned char* buffer = (unsigned char *)av_malloc(
//                av_image_get_buffer_size(mPixelFormat, width, height, 16)
//        );
//        av_image_fill_arrays(mSwrFrame->data, mSwrFrame->linesize, buffer, mPixelFormat, width, height, 16);

        mSwsContext = sws_getContext(
                mCodecContext->width, mCodecContext->height, mCodecContext->pix_fmt,
                width, height, mPixelFormat, SWS_BICUBIC,
                NULL, NULL, NULL
        );
    }
    return result;
}


AVFrame *VideoStreamDecoder::NextFrame() {
    AVFrame *frame = StreamDecoder::NextFrame();
    if (NULL == frame) {
        return NULL;
    }
    if (NULL == mSwsContext) {
        return frame;
    }

    sws_scale(mSwsContext, frame->data,
              frame->linesize, 0, mCodecContext->height,
              mSwrFrame->data, mSwrFrame->linesize);
    return mSwrFrame;
}

sws_scale看名字雖然是縮放,但它實際上也會對format進行轉換,轉換的參數由SwsContext提供:

struct SwsContext *sws_getContext(
    int srcW,                     // 源圖像的寬
    int srcH,                     // 源圖像的高
    enum AVPixelFormat srcFormat, // 源圖像的格式
    int dstW,                     // 目標圖像的寬
    int dstH,                     // 目標圖像的高
    enum AVPixelFormat dstFormat, // 目標圖像的格式
    int flags,                    // 暫時可忽略
    SwsFilter *srcFilter,         // 暫時可忽略
    SwsFilter *dstFilter,         // 暫時可忽略
    const double *param           // 暫時可忽略
);

sws_scale支持區域轉碼,可以如我們的demo將整幅圖像進行轉碼,也可以將圖像切成多個區域分別轉碼,這樣方便實用多線程加快轉碼效率:

int sws_scale(
    struct SwsContext *c,             // 轉碼上下文
    const uint8_t *const srcSlice[],  // 源畫面區域像素數據,對應源AVFrame的data字段
    const int srcStride[],            // 源畫面區域行寬數據,對應源AVFrame的linesize字段
    int srcSliceY,                    // 源畫面區域起始Y座標,用於計算應該放到目標圖像的哪個位置
    int srcSliceH,                    // 源畫面區域行數,用於計算應該放到目標圖像的哪個位置
    uint8_t *const dst[],             // 轉碼後圖像數據存儲,對應目標AVFrame的data字段
    const int dstStride[]             // 轉碼後行寬數據存儲,對應目標AVFrame的linesize字段
);

srcSlice和srcStride存儲了源圖像部分區域的圖像數據,srcSliceY和srcSliceH告訴轉碼器這部分區域的座標範圍,用於計算偏移量將轉碼結果存放到dst和dstStride中。

例如下面的代碼就將一幅完整的圖像分成上下兩部分分別進行轉碼:

int halfHeight = mCodecContext->height / 2;

// 轉碼上半部分圖像
uint8_t *dataTop[AV_NUM_DATA_POINTERS] = {
        frame->data[0],
        frame->data[1],
        frame->data[2]
};
sws_scale(mSwsContext, dataTop,
            frame->linesize, 0,
            halfHeight,
            mSwrFrame->data, mSwrFrame->linesize);

// 轉碼下半部分圖像
uint8_t *dataBottom[AV_NUM_DATA_POINTERS] = {
        frame->data[0] + (frame->linesize[0] * halfHeight),
        frame->data[1] + (frame->linesize[1] * halfHeight),
        frame->data[2] + (frame->linesize[2] * halfHeight),
};
sws_scale(mSwsContext, dataBottom,
            frame->linesize, halfHeight,
            mCodecContext->height - halfHeight,
            mSwrFrame->data, mSwrFrame->linesize);

AVFrame內存管理機制

我們創建了一個新的AVFrame用於接收轉碼後的圖像:

mSwrFrame = av_frame_alloc();

// 方式一,使用av_frame_get_buffer創建數據存儲空間,av_frame_free的時候會自動釋放
mSwrFrame->width = width;
mSwrFrame->height = height;
mSwrFrame->format = mPixelFormat;
av_frame_get_buffer(mSwrFrame, 0);

// 方式二,使用av_image_fill_arrays指定存儲空間,需要我們手動調用av_malloc、av_free去創建、釋放buffer的空間
// int bufferSize = av_image_get_buffer_size(mPixelFormat, width, height, 16);
// unsigned char* buffer = (unsigned char *)av_malloc(bufferSize);
// av_image_fill_arrays(mSwrFrame->data, mSwrFrame->linesize, buffer, mPixelFormat, width, height, 16);

av_frame_alloc創建出來的AVFrame只是一個殼,我們需要爲它提供實際存儲像素數據和行寬數據的內存空間,如上所示有兩種方法:

1.通過av_frame_get_buffer創建存儲空間,data成員的空間實際上是由buf[0]->data提供的:

LOGD("mSwrFrame --> buf : 0x%X~0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X",
    mSwrFrame->buf[0]->data,
    mSwrFrame->buf[0]->data + mSwrFrame->buf[0]->size,
    mSwrFrame->data[0],
    mSwrFrame->data[1],
    mSwrFrame->data[2]
);
// mSwrFrame --> buf : 0x2E6E8AC0~0x2E753F40, data[0]: 0x2E6E8AC0, data[1]: 0x2E7302E0, data[2]: 0x2E742100

通過av_image_fill_arrays指定外部存儲空間,data成員的空間就是我們指的的外部空間,而buf成員是NULL:

LOGD("mSwrFrame --> buffer : 0x%X~0x%X, buf : 0x%X, data[0]: 0x%X, data[1]: 0x%X, data[2]: 0x%X",
    buffer,
    buffer + bufferSize,
    mSwrFrame->buf[0],
    mSwrFrame->data[0],
    mSwrFrame->data[1],
    mSwrFrame->data[2]
);
// FFmpegDemo: mSwrFrame --> buffer : 0x2DAE4DC0~0x2DB4D5C0, buf : 0x0, data[0]: 0x2DAE4DC0, data[1]: 0x2DB2A780, data[2]: 0x2DB3BEA0

而av_frame_free內部會去釋放AVFrame裏buf的空間,對於data成員它只是簡單的把指針賦值爲0,所以通過av_frame_get_buffer創建存儲空間,而通過av_image_fill_arrays指定外部存儲空間需要我們手動調用av_free去釋放外部空間。

align

細心的同學可能還看到了av_image_get_buffer_size和av_image_fill_arrays都傳了個16的align,這裏對應的就是之前講的linesize的字節對齊,會填充數據讓linesize變成16、或者32的整數倍:

@param align         the value used in src for linesize alignment

這裏如果爲0會填充失敗:

而爲1不做填充會出現和實際解碼中的linesize不一致導致畫面異常:

av_frame_get_buffer則比較人性化,它推薦你填0讓它自己去判斷應該按多少對齊:

 * @param align Required buffer size alignment. If equal to 0, alignment will be
 *              chosen automatically for the current CPU. It is highly
 *              recommended to pass 0 here unless you know what you are doing.

完整代碼

完整的demo代碼已經放到Github上,感興趣的同學可以下載來看看

FFmpeg入門 - 格式轉換音頻分⽚(plane)與打包(packed) 音頻數據的實際長度音頻格式轉換視頻格式轉換 AVFrame內存管理機制 align 完整代碼

音頻分⽚(plane)與打包(packed)

音頻數據的實際長度

音頻格式轉換

視頻格式轉換

AVFrame內存管理機制

align

完整代碼

FileProvider的一些事文件URI 打開文件系統應用使用FileProvider的坑直接傳遞ParcelFileDescriptor

記一個線程阻塞問題的分析過程 Long monitor contention kill -3 命令 schedstat 線程鎖定位

FFmpeg入門 - 格式轉換音頻分⽚(plane)與打包(packed) 音頻數據的實際長度音頻格式轉換視頻格式轉換 AVFrame內存管理機制 align 完整代碼

當Gson遇上data class Gson解析流程非空類型失效和構造函數不會被調用的原理全部成員都有默認值的情況解決思路

Jni多線程與類加載 native子線程加載不了自定義的Class 解決方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

FFmpeg入門 - 格式轉換 音頻分⽚(plane)與打包(packed) 音頻數據的實際長度 音頻格式轉換 視頻格式轉換 AVFrame內存管理機制 align 完整代碼

音頻分⽚(plane)與打包(packed)

音頻數據的實際長度

音頻格式轉換

視頻格式轉換

AVFrame內存管理機制

align

完整代碼

FFmpeg入門 - 格式轉換音頻分⽚(plane)與打包(packed) 音頻數據的實際長度音頻格式轉換視頻格式轉換 AVFrame內存管理機制 align 完整代碼