FFmpeg開發之旅(四)---全字幕解碼

【寫在前面】

在前一篇，我已經講過了讀取外掛字幕並顯示的方法：理解過濾圖並使用字幕過濾器

但是，全字幕不僅僅是外掛字幕，還有內封字幕和內嵌字幕，因此我們還得考慮其他兩種字幕。

不過，對於內嵌字幕，我們根本不需要解碼，因爲它是直接繪製在視頻圖像上的。

所以，本篇只需要講解內封字幕的解碼方法，主要內容有：

1、ass 等格式內封字幕解碼。

2、sub+idx 格式內封字幕解碼。

3、同步視頻和字幕。

【正文開始】

首先是內封字幕：

我們知道，所謂內封字幕，就是將字幕文件(可能是srt, ass)封裝在視頻容器中，成爲字幕流。

因此只要確定視頻存在字幕流( ass等 )，就可以使用和外掛字幕一樣的方法進行解碼。

當然了，略微有些不同，先來看看代碼：

    AVFormatContext *formatContext = nullptr;
    AVCodecContext *videoCodecContext = nullptr, *subCodecContext = nullptr;
    AVStream *videoStream = nullptr, *subStream = nullptr;
    int videoIndex = -1, subIndex = -1;

    //打開輸入文件，並分配格式上下文
    avformat_open_input(&formatContext, m_filename.toStdString().c_str(), nullptr, nullptr);
    avformat_find_stream_info(formatContext, nullptr);

    //找到視頻流，字幕流的索引
    for (size_t i = 0; i < formatContext->nb_streams; ++i) {
        if (formatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            videoIndex = int(i);
            videoStream = formatContext->streams[i];
        } else if (formatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_SUBTITLE) {
            subIndex = int(i);
            subStream = formatContext->streams[i];
        }
    }

    //打印相關信息，在 stderr
    av_dump_format(formatContext, 0, "format", 0);
    fflush(stderr);

    if (!open_codec_context(videoCodecContext, videoStream)) {
        qDebug() << "Open Video Context Failed!";
        return;
    }

    if (!open_codec_context(subCodecContext, subStream)) {
        //字幕流打開失敗，也可能是沒有，但無影響，接着處理
        qDebug() << "Open Subtitle Context Failed!";
    }

這塊代碼就是簡單的找到視頻流和字幕流，並打開相關上下文( Context )，如果不懂，可以前往第一篇視頻解碼。

然後我們繼續往下看：

    m_fps = videoStream->avg_frame_rate.num / videoStream->avg_frame_rate.den;
    m_width = videoCodecContext->width;
    m_height = videoCodecContext->height;

    //初始化filter相關
    AVRational time_base = videoStream->time_base;
    QString args = QString::asprintf("video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d",
                                     m_width, m_height, videoCodecContext->pix_fmt, time_base.num, time_base.den,
                                     videoCodecContext->sample_aspect_ratio.num, videoCodecContext->sample_aspect_ratio.den);
    qDebug() << "Video Args: " << args;

    AVFilterContext *buffersrcContext = nullptr;
    AVFilterContext *buffersinkContext = nullptr;
    bool subtitleOpened = false;

    //如果有字幕流
    if (subCodecContext) {
        //字幕流直接用視頻名即可
        QString subtitleFilename = m_filename;
        subtitleFilename.replace('/', "\\\\");
        subtitleFilename.insert(subtitleFilename.indexOf(":\\"), char('\\'));
        QString filterDesc = QString("subtitles=filename='%1':original_size=%2x%3")
                .arg(subtitleFilename).arg(m_width).arg(m_height);
        qDebug() << "Filter Description:" << filterDesc.toStdString().c_str();
        subtitleOpened = init_subtitle_filter(buffersrcContext, buffersinkContext, args, filterDesc);
        if (!subtitleOpened) {
            qDebug() << "字幕打開失敗!";
        }
    } else {
        //沒有字幕流時，在同目錄下尋找字幕文件
        //字幕相關，使用subtitles，目前測試的是ass，但srt, ssa, ass, lrc都行，改後綴名即可
        int suffixLength = QFileInfo(m_filename).suffix().length();
        QString subtitleFilename = m_filename.mid(0, m_filename.length() - suffixLength - 1) + ".ass";
        if (QFile::exists(subtitleFilename)) {
            //初始化subtitle filter
            //絕對路徑必須轉成D\:\\xxx\\test.ass這種形式, 記住，是[D\:\\]這種形式
            //toNativeSeparator()無用，因爲只是 / -> \ 的轉換
            subtitleFilename.replace('/', "\\\\");
            subtitleFilename.insert(subtitleFilename.indexOf(":\\"), char('\\'));
            QString filterDesc = QString("subtitles=filename='%1':original_size=%2x%3")
                    .arg(subtitleFilename).arg(m_width).arg(m_height);
            qDebug() << "Filter Description:" << filterDesc.toStdString().c_str();
            subtitleOpened = init_subtitle_filter(buffersrcContext, buffersinkContext, args, filterDesc);
            if (!subtitleOpened) {
                qDebug() << "字幕打開失敗!";
            }
        }
    }

1、如果存在字幕流( if (subCodecContext) )，那麼就初始化一個字幕過濾器，字幕過濾器的參數是：

要注意，對於外掛字幕而言，filename 即爲字幕文件名，而對於內封字幕，fliename 爲視頻文件名，格式爲：[ D\:\\ ]。

2、如果不存在存在字幕流，那麼就尋找同目錄下的外掛字幕。

然而，這只是 ass 等格式的內封字幕，對於 sub+idx 格式的內嵌字幕，就需要我們自己解碼、繪製了。

我們知道，sub+idx 是圖形字幕格式，sub 包含了一系列的字幕位圖，idx 則是其索引。

當然，對於內部如何我們無需知曉，因爲 ffmpeg 會將其解碼，具體如下：

    SubtitleFrame subFrame;

    //讀取下一幀
    while (m_runnable && av_read_frame(formatContext, packet) >= 0) {
        if (packet->stream_index == videoIndex) {
            //發送給解碼器
            int ret = avcodec_send_packet(videoCodecContext, packet);

            while (ret >= 0) {
                //從解碼器接收解碼後的幀
                ret = avcodec_receive_frame(videoCodecContext, frame);

                frame->pts = frame->best_effort_timestamp;

                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
                else if (ret < 0) goto Run_End;

                //如果字幕成功打開，則輸出使用subtitle filter過濾後的圖像
                if (subtitleOpened) {
                    if (av_buffersrc_add_frame_flags(buffersrcContext, frame, AV_BUFFERSRC_FLAG_KEEP_REF) < 0)
                        break;

                    while (true) {
                        ret = av_buffersink_get_frame(buffersinkContext, filter_frame);

                        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
                        else if (ret < 0) goto Run_End;

                        QImage videoImage = convert_image(filter_frame);
                        m_frameQueue.enqueue(videoImage);

                        av_frame_unref(filter_frame);
                    }
                } else {
                    //未打開字幕過濾器或無字幕
                    if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
                    else if (ret < 0) goto Run_End;

                    QImage videoImage = convert_image(frame);
                    //如果需要顯示字幕，就將字幕覆蓋上去
                    if (frame->pts >= subFrame.pts && frame->pts <= (subFrame.pts + subFrame.duration)) {
                        videoImage = overlay_subtitle(videoImage, subFrame.image);
                    }
                    m_frameQueue.enqueue(videoImage);
                }
                av_frame_unref(frame);
            }
        } else if (packet->stream_index == subIndex) {
            AVSubtitle subtitle;
            int got_frame;
            int ret = avcodec_decode_subtitle2(subCodecContext, &subtitle, &got_frame, packet);

            if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) break;
            else if (ret < 0) goto Run_End;

            if (got_frame > 0) {
                //如果是圖像字幕，即sub + idx
                //實際上，只需要處理這種即可
                if (subtitle.format == 0) {
                    for (size_t i = 0; i < subtitle.num_rects; i++) {
                        AVSubtitleRect *sub_rect = subtitle.rects[i];

                        int dst_linesize[4];
                        uint8_t *dst_data[4];
                        //注意，這裏是RGBA格式，需要Alpha
                        av_image_alloc(dst_data, dst_linesize, sub_rect->w, sub_rect->h, AV_PIX_FMT_RGBA, 1);
                        SwsContext *swsContext = sws_getContext(sub_rect->w, sub_rect->h, AV_PIX_FMT_PAL8,
                                                                sub_rect->w, sub_rect->h, AV_PIX_FMT_RGBA,
                                                                SWS_BILINEAR, nullptr, nullptr, nullptr);
                        sws_scale(swsContext, sub_rect->data, sub_rect->linesize, 0, sub_rect->h, dst_data, dst_linesize);
                        sws_freeContext(swsContext);
                        //這裏也使用RGBA
                        QImage image = QImage(dst_data[0], sub_rect->w, sub_rect->h, QImage::Format_RGBA8888).copy();
                        av_freep(&dst_data[0]);

                        //subFrame存儲當前的字幕
                        //只有圖像字幕纔有start_display_time和start_display_time
                        subFrame.pts = packet->pts;
                        subFrame.duration = subtitle.end_display_time - subtitle.start_display_time;
                        subFrame.image = image;
                    }
                } else {
                    //如果是文本格式字幕:srt, ssa, ass, lrc
                    //可以直接輸出文本，實際上已經添加到過濾器中
                    qreal pts = packet->pts * av_q2d(subStream->time_base);
                    qreal duration = packet->duration * av_q2d(subStream->time_base);
                    const char *text = const_int8_ptr(packet->data);
                    qDebug() << "[PTS: " << pts << "]" << endl
                             << "[Duration: " << duration << "]" << endl
                             << "[Text: " << text << "]" << endl;
                }
            }
        }

先來看 else if (packet->stream_index == subIndex) 部分：

1、使用 avcodec_decode_subtitle2() 獲取一幀字幕。

2、subtilte.format 存儲字幕格式，爲0代表圖像字幕。

3、subtitle.rects 存儲了字幕位圖，因此我們只需要將其轉換成想要的圖像格式，然後覆蓋( overlay )在視頻圖像上即可。

4、這裏需要小小的注意一下，因爲視頻和字幕並不是同時解碼的，並且，字幕會持續一段時間，也就是說，可能很多幀視頻使用同一幀字幕，所以我們要同步視頻和字幕，這裏使用了一個 SubtitleFrame，它的定義如下：

    struct SubtitleFrame { 
        QImage image; 
        int64_t pts; 
        int64_t duration; 
    };

我的同步方法是：videoFrame.pts >= subFrame.pts && videoFrame.pts <= subFrame.pts + subFrame.duration，即視頻幀的顯示時間戳處於[字幕開始, 字幕結束]之間時，就顯示字幕。

現在我們回到 if (subtitleOpened) 這裏。

1、如果字幕已經成功打開( ass等格式的外掛字幕或內封字幕 )，我們就直接使用字幕過濾器將字幕添加到視頻幀。

2、如果字幕未能成功打開( 爲sub+idx格式或沒有字幕 )，我們就將 subFrame 覆蓋到視頻幀上，注意，subFrame 我們在 else if (packet->stream_index == subIndex) 中已經得到了，當然，如果沒有則其爲空。

其中，conver_image() 和 overlay_subtitle() 很簡單，所以直接看源碼就好了。

至此，內封字幕講解完畢。

【結語】

本篇代碼可能略多，並且需要結合前一篇才能更好的理解。

其實ffmpeg還提供了很多方便的命令，比如添加我測試用的一些字幕，所以後面我會專門講一講的。

最後，附上項目鏈接(多多star呀..⭐_⭐)：

Github的：https://github.com/mengps/FFmpeg-Learn 。

CSDN的：https://download.csdn.net/download/u011283226/11833900 包含一個ass和mp4，一個內封ass字幕的mkv，一個內封sub+idx字幕的mkv，以便測試。

FFmpeg開發之旅(四)---全字幕解碼

【寫在前面】

【正文開始】

【結語】

Android啓動過程-萬字長文(Android14)

【SQL進階】CASE語句的使用

optional install error: Error: Unsupported URL Type: npm:vue-loader@^16.1.0

這種嵌套字典類型的數據，我想把它讀取到df裏，如何操作？

微調真的能讓LLM學到新東西嗎:引入新知識可能讓模型產生更多的幻覺

iNeuOS工業互聯網操作系統，增加電力IEC104協議

微服務實踐k8s&dapr開發部署實驗（3）訂閱發佈

chromedriver版本

kbgressdb之數據結構V0.2

Qml實現簡易版Qt Linguist(語言家) & QXmlStreamReader / QXmlStreamWriter 的使用方法

Qml中實現多視圖，多圖像源(QImage / QPixmap)

FFmpeg開發之旅(四)---全字幕解碼

Qt中的那些坑(二)---qDebug和QString中的轉義字符

FFmpeg開發之旅(零)---環境搭建

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結