WebRTC視頻JitterBuffer詳解

1 WebRTC版本

m74。

2 概要

舊版的視頻JitterBuffer實現在VCMJitterBuffer類中，目前已經不用，新版的JitterBuffer的功能被分散到多個模塊中，主要包括：

PacketBuffer：負責幀的完整性，保證組成幀的每個包序列號連續，並且有一個包標識幀的開始，有一個包標識幀的結束；
RtpFrameReferenceFinder：負責給每個幀設置好參考幀，同時兼顧GOP內各幀的連續性；
FrameBuffer：負責幀的連續性和可解碼性，這裏幀的連續性是指某幀的所有參考幀都已經收到，幀的可解碼性是指某幀的所有參考幀都已經被解碼；
VCMJitterEstimator：計算抖動(googJitterbufferMS)，用於計算目標延遲(googTargetDelayMs)，用於音視頻同步；
VCMTiming：計算當前延遲(googCurrentDelayMs)，用於計算渲染時間。

本文對照代碼描述上述模塊的主要工作過程。

3 JitterBuffer結構和基本流程

RtpVideoStreamReceiver類收到RTP包後，交給PacketBuffer類緩存、排序，PacketBuffer收集滿1個完整的幀後，交還給RtpVideoStreamReceiver類，RtpVideoStreamReceiver類將一個完整的幀交給RtpFrameReferenceFinder，RtpFrameReferenceFinder類緩存最近的GOP，每個完整幀落在一個GOP中會填充好該幀的參考幀，交還給RtpVideoStreamReceiver，RtpVideoStreamReceiver將填充好參考幀的完整幀交給FrameBuffer，FrameBuffer判斷某幀的所有參考幀都收到認爲該幀連續，在某幀的所有參考幀都解碼後認爲該幀可以解碼，從而可以交給解碼器。

可以認爲JitterBuffer的這些模塊分三個層次分別做了RTP包的排序、GOP內幀的排序、GOP之間的排序：

包的排序：PacketBuffer；
幀的排序：RtpFrameReferenceFinder；
GOP的排序：FrameBuffer。

4 幀完整性 - PacketBuffer

4.1 包緩存

PacketBuffer類有兩個類型的包緩存：

std::vector data_buffer_，數據緩存，保存包原始數據，用於拼接整幀原始數據；
std::vector sequence_buffer_，排序緩存，保存包連續性信息，用於緩存包序列號等信息並排序成完整的幀。

連續性信息：

struct ContinuityInfo {
    // 包序列號.
    uint16_t seq_num = 0;

    // 是否爲幀的第一個包.
    bool frame_begin = false;

    // 是否爲幀的最後一個包.
    bool frame_end = false;

    // 這個槽是否已經被使用.
    bool used = false;

    // 標識當前包之前的所有包是否都已經被插入包緩存，也就是當前包之前的所有包是否連續.
    bool continuous = false;

    // 當前包是否已經用於創建一個幀.
    bool frame_created = false;
  };

4.2 幀的開始和結束

在packet_buffer.cc:348有一段註釋：

        // In the case of H264 we don't have a frame_begin bit (yes,
        // |frame_begin| might be set to true but that is a lie). So instead
        // we traverese backwards as long as we have a previous packet and
        // the timestamp of that packet is the same as this one. This may cause
        // the PacketBuffer to hand out incomplete frames.
        // See: https://bugs.chromium.org/p/webrtc/issues/detail?id=7106

這個註釋意爲H264的RTP包並沒有一個可信的幀開始標識，並貼上一個7106問題鏈接，打開這個鏈接，可以看到問題在2017年的原有描述是在RTP分包方式FUA下，本該設置的幀開始標識S並沒有被正確置位，但是在2018年4月該問題被修改成可以通過first_mb_in_slice來代替FUA S位。

但是實際上即使是到目前master的最新版本代碼(13788025c81712df7e5535931a0b1d7931da6c2d )仍然還是使用FUA S位來標識FUA分包模式下一幀的第一個包，並且我測試過的多個版本(57，64，74)都沒有出現FUA S位未正常置位的情況，可能已經在17年後的版本中被修復。

bool RtpDepacketizerH264::ParseFuaNalu(
    RtpDepacketizer::ParsedPayload* parsed_payload,
    const uint8_t* payload_data) {
  ……
  bool first_fragment = (payload_data[1] & kSBit) > 0;

在這裏重點強調一幀第一個包的標識是因爲該標識對判斷幀的完整性有重要作用，另外，一幀的最後一個包就是簡單根據RTP頭中的marker位來標識，只有在第一個包、最後一個包都取到並且中間的所有包都連續的情況下，才認爲是一個完整的幀。

4.3 插入RTP數據包 - PacketBuffer::InsertPacket

數據緩存、排序緩存這兩個包緩存都是初始長度爲size_(512)的數組，一旦緩存滿會倍增容量，直到達到最大長度max_size_(2048)。

插入包的過程就是把數據填入這兩個緩存的過程，同時會判斷是否出現丟包，如果出現丟包則等待，在沒有出現丟包的情況下，會判斷是否已經獲得了完整的幀，如果已經組裝好了若干完整的幀，則通過OnAssembledFrame回調通知RtpVideoStreamReceiver。

bool PacketBuffer::InsertPacket(VCMPacket* packet) {
  std::vector<std::unique_ptr<RtpFrameObject>> found_frames;
  {
    rtc::CritScope lock(&crit_);
	// 當前包序列號
    uint16_t seq_num = packet->seqNum;
    // 當前包在包緩存(包括數據緩存和排序緩存)中的索引
    size_t index = seq_num % size_;

	// 如果是第一個包
    if (!first_packet_received_) {
      // 保存第一個包序列號
      first_seq_num_ = seq_num;
      // 接收到了第一個包狀態置位
      first_packet_received_ = true;
    } else if (AheadOf(first_seq_num_, seq_num)) {  // 如果當前包比之前記錄的第一個包first_seq_num_還老
      // 並且之前已經清理過第一個包序列號，說明已經至少成功解碼過一幀，RtpVideoStreamReceiver::FrameDecoded
      // 會調用PacketBuffer::ClearTo(seq_num)，清理first_seq_num_之前的所有緩存，這個時候還來一個比first_seq_num_還
      // 老的包，就沒有必要再留着了。
      if (is_cleared_to_first_seq_num_) {
        delete[] packet->dataPtr;
        packet->dataPtr = nullptr;
        return false;
      }
	  
	  // 相反如果沒有被清理過，則是有必要保留成第一個包的，比如發生了亂序。
      first_seq_num_ = seq_num;
    }

	// 如果這個槽被佔用了
    if (sequence_buffer_[index].used) {
      // 如果序列號相等，則爲重複包，刪除負載並丟棄。
      if (data_buffer_[index].seqNum == packet->seqNum) {
        delete[] packet->dataPtr;
        packet->dataPtr = nullptr;
        return true;
      }

      // 如果槽被佔但是輸入包和對應槽的包序列號不等，說明緩存滿了，需要擴容。
      while (ExpandBufferSize() && sequence_buffer_[seq_num % size_].used) {
      }
      // 重新計算輸入包索引.
      index = seq_num % size_;
	  // 如果對應的槽還是被佔用了，還是滿，已經不行了，致命錯誤.
      if (sequence_buffer_[index].used) {
        delete[] packet->dataPtr;
        packet->dataPtr = nullptr;
        return false;
      }
    }
    // 如果沒有錯誤，在index對應槽位填入當前包的信息
    sequence_buffer_[index].frame_begin = packet->is_first_packet_in_frame();  // 第一個包標識
    sequence_buffer_[index].frame_end = packet->is_last_packet_in_frame();	   // 最後一個包標識
    sequence_buffer_[index].seq_num = packet->seqNum;						   // 序列號
    sequence_buffer_[index].continuous = false;								   // 之前的包是否連續，這裏初始爲false，在FindFrames中置位
    sequence_buffer_[index].frame_created = false;							   // 是否已經用於創建一個幀，在FindFrames中置位
    sequence_buffer_[index].used = true;                                       // 槽位已經被佔
    data_buffer_[index] = *packet;                                             // 存入數據緩存
    packet->dataPtr = nullptr;												   // 轉移了指針的所有者

    // 更新丟包信息，檢查收到當前包後是否有丟包導致的空洞，也就是不連續.
    UpdateMissingPackets(packet->seqNum);

    // 更新時間戳
    int64_t now_ms = clock_->TimeInMilliseconds();
    last_received_packet_ms_ = now_ms;
    if (packet->frameType == kVideoFrameKey)
      last_received_keyframe_packet_ms_ = now_ms;
    // 分析排序緩存，檢查是否能夠組裝出完整的幀並返回.
    found_frames = FindFrames(seq_num);
  }

  // 如果有完整的幀則通過回調OnAssembledFrame上報RtpVideoStreamReceiver.
  for (std::unique_ptr<RtpFrameObject>& frame : found_frames)
    assembled_frame_callback_->OnAssembledFrame(std::move(frame));

  return true;
}

4.4 處理RTP填充包 - PacketBuffer::PaddingReceived

發送端可能在編碼器輸出碼率不足的情況下爲保證發送碼率填充空包，空包不會進入排序緩存和數據緩存，但是會觸發丟包檢測和完整幀的檢測。

void PacketBuffer::PaddingReceived(uint16_t seq_num) {
  std::vector<std::unique_ptr<RtpFrameObject>> found_frames;
  {
    rtc::CritScope lock(&crit_);
    // 更新丟包信息，檢查收到當前包後是否有丟包導致的空洞，也就是不連續.
    UpdateMissingPackets(seq_num);
    // 分析排序緩存，檢查是否能夠組裝出完整的幀並返回.
    found_frames = FindFrames(static_cast<uint16_t>(seq_num + 1));
  }

  // 如果有完整的幀則通過回調OnAssembledFrame上報RtpVideoStreamReceiver.
  for (std::unique_ptr<RtpFrameObject>& frame : found_frames)
    assembled_frame_callback_->OnAssembledFrame(std::move(frame));
}

4.5 丟包檢測 - PacketBuffer::UpdateMissingPackets

PacketBuffer維護一個丟包緩存missing_packets_，主要用於在PacketBuffer::FindFrames中判斷某個已經完整的P幀前面是否有未完整的幀，如果有，該幀可能是I幀，也可能是P幀，這裏並不會立刻把這個完整的P幀向後傳遞給RtpFrameReferenceFinder，而是暫時清除狀態，等待前面的所有幀完整後才重複檢測操作，所以這裏實際上也發生了幀的排序，併產生了一定的幀間依賴。

void PacketBuffer::UpdateMissingPackets(uint16_t seq_num) {
  // 如果最新插入的包序列號還未設置過，這裏直接設置一次.
  if (!newest_inserted_seq_num_)
    newest_inserted_seq_num_ = seq_num;

  const int kMaxPaddingAge = 1000;
  // 如果當前包的序列號新於之前的最新包序列號，沒有發生亂序
  if (AheadOf(seq_num, *newest_inserted_seq_num_)) {
    // 丟包緩存missing_packets_最大保存1000個包，這裏得到當前包1000個包以前的序列號，
    // 也就差不多是丟包緩存裏應該保存的最老的包.
    uint16_t old_seq_num = seq_num - kMaxPaddingAge;
    // 第一個>= old_seq_num的包的位置
    auto erase_to = missing_packets_.lower_bound(old_seq_num);
    // 刪除丟包緩存裏所有1000個包之前的所有包(如果有的話)
    missing_packets_.erase(missing_packets_.begin(), erase_to);

    // 如果最老的包的序列號都比當前最新包序列號新，那麼更新一下當前最新包序列號
    if (AheadOf(old_seq_num, *newest_inserted_seq_num_))
      *newest_inserted_seq_num_ = old_seq_num;

    // 因爲seq_num > newest_inserted_seq_num_，這裏開始統計(newest_inserted_seq_num_, sum)之間的空洞.
    ++*newest_inserted_seq_num_;
    // 從newest_inserted_seq_num_開始，每個小於當前seq_num的包都進入丟包緩存，
    // 直到newest_inserted_seq_num_ == seq_num，也就是最新包的序列號變成了當前seq_num.
    while (AheadOf(seq_num, *newest_inserted_seq_num_)) {
      missing_packets_.insert(*newest_inserted_seq_num_);
      ++*newest_inserted_seq_num_;
    }
  } else {
    // 如果當前收到的包的序列號小於當前收到的最新包序列號，則從丟包緩存中刪除(之前應該已經進入丟包緩存)
    missing_packets_.erase(seq_num);
  }
}

4.6 連續包檢測 - PacketBuffer::PotentialNewFrame

PacketBuffer::PotentialNewFrame(uint16_t seq_num)函數用於檢測seq_num前的所有包是連續的，只有包連續，才進入完整幀的檢測，所以叫“潛在的新幀檢測”。

bool PacketBuffer::PotentialNewFrame(uint16_t seq_num) const {
  // 通過序列號獲取緩存索引
  size_t index = seq_num % size_;
  // 上個包的索引
  int prev_index = index > 0 ? index - 1 : size_ - 1;
  // 如果當前包的槽位沒有被佔用，那麼該包之前沒有處理過，不連續.
  if (!sequence_buffer_[index].used)
    return false;
  // 如果當前包的槽位的序列號和當前包序列號不一致，不連續.
  if (sequence_buffer_[index].seq_num != seq_num)
    return false;
  // 如果當前包已經用於創建一個幀，不連續.
  if (sequence_buffer_[index].frame_created)
    return false;
  // 如果當前包的幀開始標識frame_begin爲true，那麼該包是幀第一個包，連續.
  if (sequence_buffer_[index].frame_begin)
    return true;
  // 如果上個包的槽位沒有被佔用，那麼上個包之前沒有處理過，不連續.
  if (!sequence_buffer_[prev_index].used)
    return false;
  // 如果上個包已經用於創建一個幀，不連續.  
  if (sequence_buffer_[prev_index].frame_created)
    return false;
  // 如果上個包和當前包的序列號不連續，不連續.
  if (sequence_buffer_[prev_index].seq_num !=
      static_cast<uint16_t>(sequence_buffer_[index].seq_num - 1)) {
    return false;
  }
  // 如果上個包的時間戳和當前包的時間戳不相等，不連續.
  if (data_buffer_[prev_index].timestamp != data_buffer_[index].timestamp)
    return false;
  // 排除掉以上所有錯誤後，如果上個包連續，則可以認爲當前包連續.
  if (sequence_buffer_[prev_index].continuous)
    return true;
  // 如果上個包不連續或者有其他錯誤，就返回不連續.
  return false;
}

從函數代碼可以看出，一個幀的第一個包當且僅當幀開始標識frame_begin == true才返回連續，而第二個包以後是否返回連續依賴於上個包是否連續，這個連續性的延展保證只要判定某個序列號連續，其之前的所有包都連續。

frame_begin在FUA分包模式下是由FUA頭的S位來設置的，所以上文說到這個標識的正確性很重要，如果S位沒有正確設置則在FUA模式下(大幀分包)會出現錯誤，所幸這個應該不會發生。

4.7 幀完整性檢測 - PacketBuffer::FindFrames

PacketBuffer::FindFrames函數會遍歷排序緩存中連續的包，檢查一幀的邊界，但是這裏對VPX和H264的處理做了區分：

對VPX，這個函數認爲包的frame_begin可信，這樣VPX的完整一幀就完全依賴於檢測到frame_begin和frame_end這兩個包；
對H264，這個函數認爲包的frame_begin不可信，並不依賴frame_begin來判斷幀的開始，但是frame_end仍然是可信的，具體說H264的開始標識是通過從frame_end標識的一幀最後一個包向前追溯，直到找到一個時間戳不一樣的斷層，認爲找到了完整的一個H264的幀。

另外這裏對H264的P幀做了一些特殊處理，雖然P幀可能已經完整，但是如果該P幀前面仍然有丟包空洞，不會立刻向後傳遞，會等待直到所有空洞被填滿，因爲P幀必須有參考幀才能正確解碼。

std::vector<std::unique_ptr<RtpFrameObject>> PacketBuffer::FindFrames(
    uint16_t seq_num) {
  std::vector<std::unique_ptr<RtpFrameObject>> found_frames;
  // 基本算法：遍歷所有連續包，先找到帶有frame_end標識的幀最後一個包，然後向前回溯，
  // 找到幀的第一個包(VPX是frame_begin, H264是時間戳不連續)，組成完整一幀，
  
  // PotentialNewFrame(seq_num)檢測seq_num之前的所有包是否連續.
  for (size_t i = 0; i < size_ && PotentialNewFrame(seq_num); ++i) {
    // 當前包的緩存索引
    size_t index = seq_num % size_;
    // 如果seq_num之前所有包連續，那麼seq_num自己也連續.
    sequence_buffer_[index].continuous = true;

    // 找到了幀的最後一個包.
    if (sequence_buffer_[index].frame_end) {
      size_t frame_size = 0;
      int max_nack_count = -1;
      // 幀開始序列號，從幀尾部開始.
      uint16_t start_seq_num = seq_num;
      // 幀的最小接收時間，基本是幀第一個包的接收時間.
      int64_t min_recv_time = data_buffer_[index].receive_time_ms;
      // 幀的最大接收時間，基本是最後一個包的接收時間.
      int64_t max_recv_time = data_buffer_[index].receive_time_ms;

      // 開始向前回溯，找幀的第一個包.
      // 幀開始的索引，從幀尾部開始.
      int start_index = index;
      // 已經測試的包數.
      size_t tested_packets = 0;
      // 當前包的時間戳.
      int64_t frame_timestamp = data_buffer_[start_index].timestamp;

      // Identify H.264 keyframes by means of SPS, PPS, and IDR.
      bool is_h264 = data_buffer_[start_index].codec() == kVideoCodecH264;
      bool has_h264_sps = false;
      bool has_h264_pps = false;
      bool has_h264_idr = false;
      bool is_h264_keyframe = false;

      // 從幀尾部的包開始回溯.
      while (true) {
        // 測試包數++
        ++tested_packets;
        // 累加幀大小
        frame_size += data_buffer_[start_index].sizeBytes;
        // 獲取最大重傳數
        max_nack_count =
            std::max(max_nack_count, data_buffer_[start_index].timesNacked);
        // 當前包現在被標識爲已經用於創建一個幀.
        sequence_buffer_[start_index].frame_created = true;
        // 獲取最小接收時間
        min_recv_time =
            std::min(min_recv_time, data_buffer_[start_index].receive_time_ms);
        // 獲取最大接收時間
        max_recv_time =
            std::max(max_recv_time, data_buffer_[start_index].receive_time_ms);
        // 如果是VPX，並且找到了frame_begin標識的第一個包，一幀完整，回溯結束.
        if (!is_h264 && sequence_buffer_[start_index].frame_begin)
          break;
        // 如果是H264.
        if (is_h264 && !is_h264_keyframe) {
          // 先檢測是否關鍵幀，從數據緩存獲取H264頭.
          const auto* h264_header = absl::get_if<RTPVideoHeaderH264>(
              &data_buffer_[start_index].video_header.video_type_header);
          if (!h264_header || h264_header->nalus_length >= kMaxNalusPerPacket)
            return found_frames;
          // 遍歷所有NALU，注意WebRTC所有IDR幀前面都會帶SPS、PPS.
          for (size_t j = 0; j < h264_header->nalus_length; ++j) {
            if (h264_header->nalus[j].type == H264::NaluType::kSps) {
              has_h264_sps = true;  // 找到了SPS
            } else if (h264_header->nalus[j].type == H264::NaluType::kPps) {
              has_h264_pps = true;  // 找到了PPS
            } else if (h264_header->nalus[j].type == H264::NaluType::kIdr) {
              has_h264_idr = true;  // 找到了IDR
            }
          }
          // 默認sps_pps_idr_is_h264_keyframe_爲false，也就是說只需要有IDR幀就認爲是關鍵幀，
          // 而不需要等待SPS、PPS完整.
          if ((sps_pps_idr_is_h264_keyframe_ && has_h264_idr && has_h264_sps &&
               has_h264_pps) ||
              (!sps_pps_idr_is_h264_keyframe_ && has_h264_idr)) {
            is_h264_keyframe = true;
          }
        }
		// 如果檢測包數已經達到緩存容量，中止.
        if (tested_packets == size_)
          break;
        // 搜索指針向前移動一個包.
        start_index = start_index > 0 ? start_index - 1 : size_ - 1;

        // In the case of H264 we don't have a frame_begin bit (yes,
        // |frame_begin| might be set to true but that is a lie). So instead
        // we traverese backwards as long as we have a previous packet and
        // the timestamp of that packet is the same as this one. This may cause
        // the PacketBuffer to hand out incomplete frames.
        // See: https://bugs.chromium.org/p/webrtc/issues/detail?id=7106
        // 這裏保留了註釋，可以看看H264不使用frame_begin的原因，實際上應該也可以.
        if (is_h264 &&													 // 如果是H264
            (!sequence_buffer_[start_index].used ||						 // 如果該槽位未被佔用，發現斷層.
             data_buffer_[start_index].timestamp != frame_timestamp)) {  // 如果時間戳不一致，發現斷層.
          break;														 // 結束回溯.	
        }
        // 如果仍然在一幀內，開始包序列號--.
        --start_seq_num;
      }
      // 到這裏幀的開始和結束位置已經搜索完畢，可以開始組幀.
      // 但是對H264 P幀，需要做另外的特殊處理，雖然P幀可能已經完整，
      // 但是如果該P幀前面仍然有丟包空洞，不會立刻向後傳遞，會等待直到所有空洞被填滿，
      // 因爲P幀必須有參考幀才能正確解碼。
      if (is_h264) {
        // Warn if this is an unsafe frame.
        if (has_h264_idr && (!has_h264_sps || !has_h264_pps)) {
          RTC_LOG(LS_WARNING)
              << "Received H.264-IDR frame "
              << "(SPS: " << has_h264_sps << ", PPS: " << has_h264_pps
              << "). Treating as "
              << (sps_pps_idr_is_h264_keyframe_ ? "delta" : "key")
              << " frame since WebRTC-SpsPpsIdrIsH264Keyframe is "
              << (sps_pps_idr_is_h264_keyframe_ ? "enabled." : "disabled");
        }
		// 設置數據緩存中的關鍵幀標識.
        const size_t first_packet_index = start_seq_num % size_;
        RTC_CHECK_LT(first_packet_index, size_);
        if (is_h264_keyframe) {
          data_buffer_[first_packet_index].frameType = kVideoFrameKey;
        } else {
          data_buffer_[first_packet_index].frameType = kVideoFrameDelta;
        }

        // missing_packets_.upper_bound(start_seq_num) != missing_packets_.begin()
        // 這個條件是說在丟包的列表裏搜索>start_seq_num(幀開始序列號)的第一個位置，
        // 發現其不等於丟包列表的開頭, 有些丟的包序列號小於start_seq_num, 也就是說P幀前面有丟包空洞, 
        // 舉例1：
        // missing_packets_ = { 3, 4, 6}, start_seq_num = 5, missing_packets_.upper_bound(start_seq_num)==6
        // 作爲一幀開始位置的序列號5，前面還有3、4這兩個包還未收到，那麼對P幀來說，雖然完整，但是向後傳遞也可能是沒有意義的, 
        // 所以這裏又清除了frame_created狀態，先繼續緩存，等待丟包的空洞填滿.
        // 舉例2：
        // missing_packets_ = { 10, 16, 17}, start_seq_num = 3, missing_packets_.upper_bound(start_seq_num)==10
        // 作爲一幀開始位置的序列號3，前面並沒有丟包，並且幀完整，那麼可以向後傳遞.
        if (!is_h264_keyframe && missing_packets_.upper_bound(start_seq_num) !=
                                     missing_packets_.begin()) {
          uint16_t stop_index = (index + 1) % size_;
          while (start_index != stop_index) {
            sequence_buffer_[start_index].frame_created = false;
            start_index = (start_index + 1) % size_;
          }
          // 返回找到的所有完整幀.
          return found_frames;
        }
      }
      // 馬上要組幀了，清除丟包列表中到幀開始位置之前的丟包.
      // 對H264 P幀來說，如果P幀前面有空洞不會運行到這裏，在上面已經解釋.
      // 對I幀來說，可以丟棄前面的丟包信息(?).
      missing_packets_.erase(missing_packets_.begin(),
                             missing_packets_.upper_bound(seq_num));
      // 組一個幀.
      found_frames.emplace_back(
          new RtpFrameObject(this, start_seq_num, seq_num, frame_size,
                             max_nack_count, min_recv_time, max_recv_time));
    }  // if (sequence_buffer_[index].frame_end)
    
    // 向後擴大搜索的範圍，假設丟包、亂序，當前包的seq_num剛好填補了之前的一個空洞，
    // 該包並不能檢測出一個完整幀，需要這裏向後移動指針到frame_end再進行回溯，直到檢測出完整幀，
    // 這裏會繼續檢測之前緩存的因爲前面有空洞而沒有向後傳遞的P幀。
    ++seq_num;
  }
  // 返回找到的所有完整幀.
  return found_frames;
}

4.8 總結

PacketBuffer::InsertPacket向包緩存插入RTP數據，並觸發幀完整性檢查；
PacketBuffer::PaddingReceived處理空包，並觸發幀完整性檢查；
PacketBuffer::UpdateMissingPackets，更新丟包信息，用於檢查P幀前面的空洞；
PacketBuffer::PotentialNewFrame，判斷包的連續性，只有連續的包才檢查幀完整性；
PacketBuffer::FindFrames，幀完整性檢查，如果得到完整幀，則通過OnAssembledFrame回調上報。

5 查找參考幀 - RtpFrameReferenceFinder

上圖描述了RtpFrameReferenceFinder的基本工作原理，顧名思義，RtpFrameReferenceFinder就是要找到每個幀的參考幀。I幀是GOP起始幀自參考，後續GOP內每個幀都要參考上一幀。

RtpFrameReferenceFinder維護最近的GOP表，收到P幀後，RtpFrameReferenceFinder找到P幀所屬的GOP，將P幀的參考幀設置爲GOP內該幀的上一幀，之後傳遞給FrameBuffer。

RtpFrameReferenceFinder還保證GOP內幀的輸出連續，對H264來說，每收到一幀都判斷該幀的第一個包的序列號是否與之前GOP收到的最後一個包序列號連續，是則輸出連續幀，否則緩存等待直到連續；對VPX，只需要簡單判斷PID是否連續即可。這種連續傳遞的依賴關係會導致GOP內任一幀丟失則GOP內的剩餘時間都處於卡頓狀態。

5.1 圖像ID - PID

PID(Picture ID)是每幀圖像的唯一標識，VPX定義了PID，但是H264沒有這個概念，RtpFrameReferenceFinder使用每幀的最後一個包的序列號作爲H264幀的PID。

在一個GOP內，除了I幀、P幀之外，可能還有WebRTC爲補償發送碼率填充的空包，也會佔用一個序列號。I幀是GOP的開始，沒有連續性問題，但是要判斷當前收到的P幀是否連續則需要判斷該P幀的第一個包序列號-1是否等於該GOP當前收到的最後一個包序列號，可能是上一幀的最後一個包，也可能是一個填充包。

RtpFrameReferenceFinder定義的的GOP表結構：

key	value
last_seq_num：I幀最後一個包序列號，PID	last_picture_id_gop：GOP內最新的一個幀的最後一個包的序列號，用於設置爲下一個幀的參考幀。
last_seq_num：I幀最後一個包序列號，PID	last_picture_id_with_padding_gop：GOP內最新一個包的序列號，有可能是last_picture_id_gop，也有可能是填充包，用於檢查幀的連續性。

5.2 設置參考幀 - RtpFrameReferenceFinder::ManageFramePidOrSeqNum

該函數用於檢查輸入幀的連續性，並且設置其參考幀。

RtpFrameReferenceFinder::FrameDecision
RtpFrameReferenceFinder::ManageFramePidOrSeqNum(RtpFrameObject* frame,
                                                int picture_id) {
  // 對H264，在沒有開啓generic的情況下，picture_id肯定是kNoPictureId.
  if (picture_id != kNoPictureId) {
    frame->id.picture_id = unwrapper_.Unwrap(picture_id);                   // 設置PID
    frame->num_references = frame->frame_type() == kVideoFrameKey ? 0 : 1;  // I幀自參考，P幀參考上一幀
    frame->references[0] = frame->id.picture_id - 1;                        // 參考幀是上一幀
    return kHandOff;
  }

  // 如果是關鍵幀，插入GOP表，key是last_seq_num，初始value是{last_seq_num, last_seq_num}
  if (frame->frame_type() == kVideoFrameKey) {
    last_seq_num_gop_.insert(std::make_pair(
        frame->last_seq_num(),
        std::make_pair(frame->last_seq_num(), frame->last_seq_num())));
  }

  // 如果GOP表空，那麼就不可能找到參考幀，先緩存.
  if (last_seq_num_gop_.empty())
    return kStash;

  // 刪除較老的關鍵幀(PID小於last_seq_num - 100), 但是至少保留一個。
  auto clean_to = last_seq_num_gop_.lower_bound(frame->last_seq_num() - 100);
  for (auto it = last_seq_num_gop_.begin();
       it != clean_to && last_seq_num_gop_.size() > 1;) {
    it = last_seq_num_gop_.erase(it);
  }

  // 在GOP表中搜索第一個比當前幀新的關鍵幀。
  auto seq_num_it = last_seq_num_gop_.upper_bound(frame->last_seq_num());
  // 如果搜索到的關鍵幀是最老的，說明當前幀比最老的關鍵幀還老，無法設置參考幀，丟棄.
  if (seq_num_it == last_seq_num_gop_.begin()) {
    RTC_LOG(LS_WARNING) << "Generic frame with packet range ["
                        << frame->first_seq_num() << ", "
                        << frame->last_seq_num()
                        << "] has no GoP, dropping frame.";
    return kDrop;
  }
  
  // 如果搜索到的關鍵幀不是最老的，那麼搜索到的關鍵幀的上一個關鍵幀所在的GOP裏應該可以找到參考幀，
  // 如果當前幀是關鍵幀，seq_num_it爲end(), seq_num_it--則爲最後一個關鍵幀.
  seq_num_it--;

  // 保證幀的連續，不連續則先緩存.
  // 當前GOP的最新一個幀的最後一個包的序列號.
  uint16_t last_picture_id_gop = seq_num_it->second.first;
  // 當前GOP的最新包的序列號，可能是last_picture_id_gop, 也可能是填充包.
  uint16_t last_picture_id_with_padding_gop = seq_num_it->second.second;
  // P幀的連續性檢查.
  if (frame->frame_type() == kVideoFrameDelta) {
    // 獲得P幀第一個包的上個包的序列號.
    uint16_t prev_seq_num = frame->first_seq_num() - 1;
    // 如果P幀第一個包的上個包的序列號與當前GOP的最新包的序列號不等，說明不連續，先緩存.
    if (prev_seq_num != last_picture_id_with_padding_gop)
      return kStash;
  }
  // 現在這個幀是連續的了.
  RTC_DCHECK(AheadOrAt(frame->last_seq_num(), seq_num_it->first));
  // 獲得當前幀的最後一個包的序列號，設置爲初始PID，後面還會設置一次Unwrap.
  frame->id.picture_id = frame->last_seq_num();
  // 設置幀的參考幀數，P幀才需要1個參考幀.
  frame->num_references = frame->frame_type() == kVideoFrameDelta;
  // 設置參考幀爲當前GOP的最新一個幀的最後一個包的序列號，
  // 既然該幀是連續的，那麼其參考幀自然也就是上個幀.
  frame->references[0] = rtp_seq_num_unwrapper_.Unwrap(last_picture_id_gop);
  // 如果當前幀比當前GOP的最新一個幀的最後一個包還新，則更新GOP的最新一個幀的最後一個包(first)
  // 以及GOP的最新包(second).
  if (AheadOf<uint16_t>(frame->id.picture_id, last_picture_id_gop)) {
    seq_num_it->second.first = frame->id.picture_id;   // 更新GOP的最新一個幀的最後一個包
    seq_num_it->second.second = frame->id.picture_id;  // 更新GOP的最新包，可能被填充包更新.
  }
  // 更新最新PID，H264無用.
  last_picture_id_ = frame->id.picture_id;
  // 更新填充包狀態.
  UpdateLastPictureIdWithPadding(frame->id.picture_id);
  // 設置當前幀的PID爲Unwrap形式.
  frame->id.picture_id = rtp_seq_num_unwrapper_.Unwrap(frame->id.picture_id);
  // 該包已經設置了參考幀且連續，可以向後傳遞了.
  return kHandOff;
}

5.3 處理填充包 - RtpFrameReferenceFinder::PaddingReceived

該函數緩存填充包，並更新填充包狀態，假如該填充包剛好填補了當前GOP的序列號空洞，則有可能有緩存的P幀進入連續狀態，所以嘗試處理一次緩存的P幀。

void RtpFrameReferenceFinder::PaddingReceived(uint16_t seq_num) {
  rtc::CritScope lock(&crit_);
  // 只保留最近100個填充包.
  auto clean_padding_to =
      stashed_padding_.lower_bound(seq_num - kMaxPaddingAge);
  stashed_padding_.erase(stashed_padding_.begin(), clean_padding_to);
  // 緩存填充包.
  stashed_padding_.insert(seq_num);
  // 更新填充包狀態.
  UpdateLastPictureIdWithPadding(seq_num);
  // 嘗試處理一次緩存的P幀，有可能序列號連續了.
  RetryStashedFrames();
}

5.3 更新填充包狀態 - RtpFrameReferenceFinder::UpdateLastPictureIdWithPadding

該函數檢查填充包緩存中的填充包，如果在GOP內連續則更新GOP表的last_picture_id_with_padding_gop字段，保證GOP的最新包序列號爲最新的填充包序列號，以保證幀的連續性檢查能夠正確運行下去。

void RtpFrameReferenceFinder::UpdateLastPictureIdWithPadding(uint16_t seq_num) {
  // 獲取GOP表第一個比seq_num新的I幀.
  auto gop_seq_num_it = last_seq_num_gop_.upper_bound(seq_num);

  // 如果第一個比seq_num新的I幀在GOP表首，說明seq_num已經很老了，不處理.
  if (gop_seq_num_it == last_seq_num_gop_.begin())
    return;

  // 獲取seq_num所在的GOP.
  --gop_seq_num_it;

  // 計算GOP最新包的下一個連續的序列號，看看是否可以在緩存的填充包中查到。
  uint16_t next_seq_num_with_padding = gop_seq_num_it->second.second + 1;
  // 查找填充包緩存中第一個大於等於next_seq_num_with_padding的位置.
  auto padding_seq_num_it =
      stashed_padding_.lower_bound(next_seq_num_with_padding);

  // 如果連續的序列號都能在緩存的填充包中查到，更新GOP最新包序列號，並從填充包緩存中清除.
  while (padding_seq_num_it != stashed_padding_.end() &&
         *padding_seq_num_it == next_seq_num_with_padding) {
    // 更新GOP最新包的序列號爲連續的填充包序列號.
    gop_seq_num_it->second.second = next_seq_num_with_padding;
    // 下個連續的填充包序列號.
    ++next_seq_num_with_padding;
    // 刪除填充包緩存的當前項，指向下一個.
    padding_seq_num_it = stashed_padding_.erase(padding_seq_num_it);
  }

  // 在某種情況下，這個流長時間連續但是沒有獲得新的關鍵幀，當前的幀可能比上個關鍵幀
  // 更老(例如發生了序列號wrapping), 爲防止這種情況不時的更新這個關鍵幀的PID。
  // 如果該GOP的關鍵幀的最後一個包的序列號(PID)早於當前包10000，更新該關鍵幀PID.
  if (ForwardDiff(gop_seq_num_it->first, seq_num) > 10000) {
    RTC_DCHECK_EQ(1ul, last_seq_num_gop_.size());
    // 設置新的PID爲當前幀seq_num.
    last_seq_num_gop_[seq_num] = gop_seq_num_it->second;
    // 刪除舊的項.
    last_seq_num_gop_.erase(gop_seq_num_it);
  }
}

5.4 處理緩存的包 - RtpFrameReferenceFinder::RetryStashedFrames

有兩種情況可以嘗試處理緩存的幀，持續的輸出帶參考幀的連續的幀。

在輸出完一個連續的帶參考幀的幀後，幀緩存stashed_frames_中可能還可以輸出下一個連續的帶參考幀的幀；
收到一個亂序的填充包，導致GOP中的某個P幀連續。

void RtpFrameReferenceFinder::RetryStashedFrames() {
  bool complete_frame = false;
  do {
    complete_frame = false;
    // 遍歷緩存的幀
    for (auto frame_it = stashed_frames_.begin();
         frame_it != stashed_frames_.end();) {
      // 調用ManageFramePidOrSeqNum來處理一個緩存幀，檢查是否可以輸出帶參考幀的連續的幀.
      FrameDecision decision = ManageFrameInternal(frame_it->get());
      // 檢查處理結果
      switch (decision) {
        case kStash:    // 仍然不連續，或者沒有參考幀.
          ++frame_it;   // 檢查下一個緩存幀.
          break;
        case kHandOff:  // 找到了一個帶參考幀的連續的幀.
          complete_frame = true;
          // 通過OnCompleteFrame回調輸出.
          frame_callback_->OnCompleteFrame(std::move(*frame_it));
          RTC_FALLTHROUGH();
        case kDrop:    // 無論kHandOff、kDrop都可以從緩存中刪除了.
          frame_it = stashed_frames_.erase(frame_it);  // 刪除並檢查下一個緩存幀.
      }
    }
  } while (complete_frame);  // 如果能持續找到帶參考幀的連續的幀則繼續.
}

5.5 總結

RtpFrameReferenceFinder緩存GOP信息，每個幀(以及填充包)進入GOP排序，如果某個幀連續，則設置其參考幀爲GOP內上一幀並輸出，I幀不需要參考幀，P幀需要參考幀。

6 有序輸出 - FrameBuffer

上節的RtpFrameReferenceFinder爲了設置P幀的參考幀爲上一幀，保證了GOP內幀的有序，但是不保證GOP的有序，這個保證是由FrameBuffer來實現。

如上圖所示，FrameBuffer按照幀的先後順序向解碼器輸出幀。FrameBuffer按順序輸出“可解碼”的幀，這裏的“可解碼”意思是某幀“連續”、並且其所有參考幀都已經被解碼，這裏“連續”的意思是指某個幀的所有參考幀都已經收到。I幀是自參考的，所以直接是可解碼的，但是P幀則需要等待所有參考幀，也就是上一幀被收到。

這樣，因爲PacketBuffer、RtpFrameReferenceFinder這兩個類只是保證幀的完整、GOP內幀的有序，一旦當前GOP的P幀還未完整，下個GOP的I幀提前進入FrameBuffer，則會直接丟棄當前GOP的所有後續P幀。

6.1 插入幀 - FrameBuffer::InsertFrame

該函數將當前幀插入幀緩存，如果該幀的所有參考幀都已經收到，那麼認爲該幀是連續的，那麼通過同步事件通知解碼線程取待解碼幀，同時通知參考該幀的所有幀，檢查他們的未連續參考幀數量是否已經爲0，是則連續。

int64_t FrameBuffer::InsertFrame(std::unique_ptr<EncodedFrame> frame) {
  const VideoLayerFrameId& id = frame->id;

  rtc::CritScope lock(&crit_);
  // 上一個連續的幀的PID
  int64_t last_continuous_picture_id =
      !last_continuous_frame_ ? -1 : last_continuous_frame_->picture_id;
  // 檢查參考幀是否合法，不合法則返回.
  if (!ValidReferences(*frame)) {
    RTC_LOG(LS_WARNING) << "Frame with (picture_id:spatial_id) ("
                        << id.picture_id << ":"
                        << static_cast<int>(id.spatial_layer)
                        << ") has invalid frame references, dropping frame.";
    return last_continuous_picture_id;
  }
  // 如果幀緩存溢出了.
  if (frames_.size() >= kMaxFramesBuffered) {
    // 如果是關鍵幀.
    if (frame->is_keyframe()) {
      RTC_LOG(LS_WARNING) << "Inserting keyframe (picture_id:spatial_id) ("
                          << id.picture_id << ":"
                          << static_cast<int>(id.spatial_layer)
                          << ") but buffer is full, clearing"
                          << " buffer and inserting the frame.";
      // 清理一下，繼續從當前幀開始解碼.
      ClearFramesAndHistory();
    } else {
      // 如果不是關鍵幀就返回.
      RTC_LOG(LS_WARNING) << "Frame with (picture_id:spatial_id) ("
                          << id.picture_id << ":"
                          << static_cast<int>(id.spatial_layer)
                          << ") could not be inserted due to the frame "
                          << "buffer being full, dropping frame.";
      return last_continuous_picture_id;
    }
  }
  // 最近解碼的幀PID(H264是幀最後一個包序列號).
  auto last_decoded_frame = decoded_frames_history_.GetLastDecodedFrameId();
  // 最近解碼的幀時間戳.
  auto last_decoded_frame_timestamp =
      decoded_frames_history_.GetLastDecodedFrameTimestamp();
  // 如果當前幀的PID < 最近解碼幀的PID，有可能是亂序，也有可能是序列號wrapping.
  if (last_decoded_frame && id <= *last_decoded_frame) {
    // 雖然PID更小，但是時間戳更加新，可能是編碼器重置或者序列號wrapping，
    // 假如是關鍵幀的話還是可以繼續處理的.
    if (AheadOf(frame->Timestamp(), *last_decoded_frame_timestamp) &&
        frame->is_keyframe()) {
      // If this frame has a newer timestamp but an earlier picture id then we
      // assume there has been a jump in the picture id due to some encoder
      // reconfiguration or some other reason. Even though this is not according
      // to spec we can still continue to decode from this frame if it is a
      // keyframe.
      RTC_LOG(LS_WARNING)
          << "A jump in picture id was detected, clearing buffer.";
      // 清理一下，繼續從當前幀開始解碼.
      ClearFramesAndHistory();
      last_continuous_picture_id = -1;
    } else {
      // 如果是真的亂序，而且不是關鍵幀，丟棄.
      RTC_LOG(LS_WARNING) << "Frame with (picture_id:spatial_id) ("
                          << id.picture_id << ":"
                          << static_cast<int>(id.spatial_layer)
                          << ") inserted after frame ("
                          << last_decoded_frame->picture_id << ":"
                          << static_cast<int>(last_decoded_frame->spatial_layer)
                          << ") was handed off for decoding, dropping frame.";
      return last_continuous_picture_id;
    }
  }

  // 假如序列號發生了很大跳動，清理.
  if (!frames_.empty() && id < frames_.begin()->first &&
      frames_.rbegin()->first < id) {
    RTC_LOG(LS_WARNING)
        << "A jump in picture id was detected, clearing buffer.";
    // 清理一下，繼續從當前幀開始解碼.
    ClearFramesAndHistory();
    last_continuous_picture_id = -1;
  }
  // 嘗試申請幀緩存的槽位.
  auto info = frames_.emplace(id, FrameInfo()).first;
  // 如果是重複幀，返回.
  if (info->second.frame) {
    RTC_LOG(LS_WARNING) << "Frame with (picture_id:spatial_id) ("
                        << id.picture_id << ":"
                        << static_cast<int>(id.spatial_layer)
                        << ") already inserted, dropping frame.";
    return last_continuous_picture_id;
  }
  // 更新幀信息，主要是設置幀的還未連續的參考幀數量，並建立被參考幀與參考他的幀之間的參考關係，
  // 用於當被參考幀有效時，更新參考他的幀的參考幀數量(爲0則連續)以及可解碼狀態.
  if (!UpdateFrameInfoWithIncomingFrame(*frame, info))
    return last_continuous_picture_id;
  // 如果不是被重傳的，可以用於計算時延.
  // timing_用於計算很多時延指標以及幀的預期渲染時間.
  if (!frame->delayed_by_retransmission())
    timing_->IncomingTimestamp(frame->Timestamp(), frame->ReceivedTime());
  // 保存幀到幀緩存
  info->second.frame = std::move(frame);
  // 如果該幀的未連續的參考幀數量爲0，那麼他本身已經連續，例如I幀，或者當前P幀參考的上個P幀已經收到.
  if (info->second.num_missing_continuous == 0) {
    // 設置"連續"狀態
    info->second.continuous = true;
    // 傳播"連續"狀態，也就是遍歷參考當前幀的所有幀，讓他們num_missing_continuous--
    PropagateContinuity(info);
    // 返回的最後連續幀PID
    last_continuous_picture_id = last_continuous_frame_->picture_id;
    // 現在肯定有"連續"幀，通知解碼線程幹活.
    new_continuous_frame_event_.Set();
  }
  // 返回最後連續幀PID
  return last_continuous_picture_id;
}

6.2 更新參考幀信息 - FrameBuffer::UpdateFrameInfoWithIncomingFrame

該函數檢查某幀的參考幀是否已經連續，初始化未連續參考幀計數器num_missing_continuous、未解碼參考幀計數器num_missing_decodable，同時反向建立被參考幀與依賴幀之間的關係，方便狀態(連續、可解碼)傳播。

bool FrameBuffer::UpdateFrameInfoWithIncomingFrame(const EncodedFrame& frame,
                                                   FrameMap::iterator info) {
  TRACE_EVENT0("webrtc", "FrameBuffer::UpdateFrameInfoWithIncomingFrame");
  const VideoLayerFrameId& id = frame.id;
  // 最新解碼的幀.
  auto last_decoded_frame = decoded_frames_history_.GetLastDecodedFrameId();
  RTC_DCHECK(!last_decoded_frame || *last_decoded_frame < info->first);

  struct Dependency {
    VideoLayerFrameId id;  // PID
    bool continuous;       // 只有未連續參考幀數量爲0，才爲“連續”
  };
  std::vector<Dependency> not_yet_fulfilled_dependencies;

  // 遍歷當前幀的所有參考幀
  for (size_t i = 0; i < frame.num_references; ++i) {
    // 參考幀
    VideoLayerFrameId ref_key(frame.references[i], frame.id.spatial_layer);
    // 如果當前幀的參考幀與最新的解碼幀比相等或者更早，可能是被解過碼，也有可能是亂序。
    if (last_decoded_frame && ref_key <= *last_decoded_frame) {
      // 如果這個參考幀還未解碼(亂序)，那麼這個參考幀將不再有機會被解碼, 那麼當前幀也無法被解碼，
      // 返回失敗，反之如果這個參考幀已經被解碼了，則屬於正常狀態。
      if (!decoded_frames_history_.WasDecoded(ref_key)) {
        int64_t now_ms = clock_->TimeInMilliseconds();
        if (last_log_non_decoded_ms_ + kLogNonDecodedIntervalMs < now_ms) {
          RTC_LOG(LS_WARNING)
              << "Frame with (picture_id:spatial_id) (" << id.picture_id << ":"
              << static_cast<int>(id.spatial_layer)
              << ") depends on a non-decoded frame more previous than"
              << " the last decoded frame, dropping frame.";
          last_log_non_decoded_ms_ = now_ms;
        }
        return false;
      }
    } else {
      // 如果如果當前幀的參考幀比最新的解碼幀更晚，那麼該參考幀可能還未連續.
      auto ref_info = frames_.find(ref_key);
      // 檢查一下該參考幀是否已經連續.
      bool ref_continuous =
          ref_info != frames_.end() && ref_info->second.continuous;
      // 該參考幀填入當前幀還未滿足的依賴表.
      not_yet_fulfilled_dependencies.push_back({ref_key, ref_continuous});
    }
  }
  // 未連續參考幀計數器，初始化爲當前幀還未滿足的依賴表大小.
  info->second.num_missing_continuous = not_yet_fulfilled_dependencies.size();
  // 未解碼參考幀計數器，初始化爲當前幀還未滿足的依賴表大小.
  info->second.num_missing_decodable = not_yet_fulfilled_dependencies.size();

  // 遍歷當前幀還未滿足的依賴表
  for (const Dependency& dep : not_yet_fulfilled_dependencies) {
    // 如果某個參考幀已經連續
    if (dep.continuous)
      // 未連續參考幀計數器-1
      --info->second.num_missing_continuous;
    // 建立參考幀->依賴幀反向關係，用於傳播狀態.
    frames_[dep.id].dependent_frames.push_back(id);
  }

  return true;
}

6.3 取解碼幀 - FrameBuffer::NextFrame

該函數從幀緩存中獲取一個可以解碼的幀，該幀必須是連續的(所有參考幀都已經收到)，並且其所有參考幀都已經被解碼。對I幀來說本身是連續的且自參考，可以直接被取走，P幀則需要依賴參考幀的連續、解碼狀態。

FrameBuffer::ReturnReason FrameBuffer::NextFrame(
    int64_t max_wait_time_ms,
    std::unique_ptr<EncodedFrame>* frame_out,
    bool keyframe_required) {
  TRACE_EVENT0("webrtc", "FrameBuffer::NextFrame");
  // max_wait_time_ms爲最大等待時間間隔，latest_return_time_ms爲最晚返回的絕對時間。
  int64_t latest_return_time_ms =
      clock_->TimeInMilliseconds() + max_wait_time_ms;
  int64_t wait_ms = max_wait_time_ms;
  int64_t now_ms = 0;

  do {
    // 當前時間
    now_ms = clock_->TimeInMilliseconds();
    {
      rtc::CritScope lock(&crit_);
      // 清除事件狀態
      new_continuous_frame_event_.Reset();
      if (stopped_)
        return kStopped;

      wait_ms = max_wait_time_ms;

      // 清除待解碼幀列表
      frames_to_decode_.clear();

      // 遍歷所有已經連續的幀.
      for (auto frame_it = frames_.begin();
           frame_it != frames_.end() &&
           frame_it->first <= last_continuous_frame_;
           ++frame_it) {
        // 如果幀還未連續，或者其有參考幀還未解碼，忽略.
        if (!frame_it->second.continuous ||
            frame_it->second.num_missing_decodable > 0) {
          continue;
        }
        // 如果可以解碼，取到待解碼幀.
        EncodedFrame* frame = frame_it->second.frame.get();
        // 如果需要關鍵幀，但當前幀不是關鍵幀(默認keyframe_required=false), 忽略.
        if (keyframe_required && !frame->is_keyframe())
          continue;
        // 之前最新解碼的幀時間戳.
        auto last_decoded_frame_timestamp =
            decoded_frames_history_.GetLastDecodedFrameTimestamp();

        // 如果待解碼幀早於之前最新解碼的幀時間戳，亂序，不處理.
        if (last_decoded_frame_timestamp &&
            AheadOf(*last_decoded_frame_timestamp, frame->Timestamp())) {
          continue;
        }

        // VPX，不處理.
        if (frame->inter_layer_predicted) {
          continue;
        }

        // 收集超幀，H264只有一個完整幀，current_superframe.size()爲1.
        std::vector<FrameMap::iterator> current_superframe;
        current_superframe.push_back(frame_it);
        // H264爲true，只有一層.
        bool last_layer_completed =
            frame_it->second.frame->is_last_spatial_layer;
        FrameMap::iterator next_frame_it = frame_it;
        while (true) {
          // 這裏面是VPX的邏輯，忽略.
          ++next_frame_it;
          if (next_frame_it == frames_.end() ||
              next_frame_it->first.picture_id != frame->id.picture_id ||
              !next_frame_it->second.continuous) {
            break;
          }
          // Check if the next frame has some undecoded references other than
          // the previous frame in the same superframe.
          size_t num_allowed_undecoded_refs =
              (next_frame_it->second.frame->inter_layer_predicted) ? 1 : 0;
          if (next_frame_it->second.num_missing_decodable >
              num_allowed_undecoded_refs) {
            break;
          }
          // All frames in the superframe should have the same timestamp.
          if (frame->Timestamp() != next_frame_it->second.frame->Timestamp()) {
            RTC_LOG(LS_WARNING)
                << "Frames in a single superframe have different"
                   " timestamps. Skipping undecodable superframe.";
            break;
          }
          current_superframe.push_back(next_frame_it);
          last_layer_completed =
              next_frame_it->second.frame->is_last_spatial_layer;
        }
        // Check if the current superframe is complete.
        // TODO(bugs.webrtc.org/10064): consider returning all available to
        // decode frames even if the superframe is not complete yet.
        if (!last_layer_completed) {
          continue;
        }
        // 待解碼幀列表只有1個.
        frames_to_decode_ = std::move(current_superframe);
        // 如果未設置過渲染時間則設置渲染時間.
        if (frame->RenderTime() == -1) {
          frame->SetRenderTime(
              timing_->RenderTimeMs(frame->Timestamp(), now_ms));
        }
        // 檢查可以繼續等待的剩餘時間.
        wait_ms = timing_->MaxWaitingTime(frame->RenderTime(), now_ms);

        // wait_ms = frame->RenderTime() - now_ms - 渲染時間 - 解碼時間
        // 如果wait_ms < -kMaxAllowedFrameDelayMs，說明可能解碼性能不夠，
        // 解碼時間過長，該幀已經來不及渲染了，忽略該幀.
        if (wait_ms < -kMaxAllowedFrameDelayMs)
          continue;
        // 已經獲得了待解碼幀，退出搜索.
        break;
      }
    }  // rtc::Critscope lock(&crit_);
    // 更新剩餘等待時間
    wait_ms = std::min<int64_t>(wait_ms, latest_return_time_ms - now_ms);
    wait_ms = std::max<int64_t>(wait_ms, 0);
  } while (new_continuous_frame_event_.Wait(wait_ms));  // 阻塞等待

  {
    rtc::CritScope lock(&crit_);
    now_ms = clock_->TimeInMilliseconds();
    // TODO(ilnik): remove |frames_out| use frames_to_decode_ directly.
    std::vector<EncodedFrame*> frames_out;
    // 如果獲得了可解碼幀
    if (!frames_to_decode_.empty()) {
      bool superframe_delayed_by_retransmission = false;
      size_t superframe_size = 0;
      EncodedFrame* first_frame = frames_to_decode_[0]->second.frame.get();
      int64_t render_time_ms = first_frame->RenderTime();     // 預期渲染時間
      int64_t receive_time_ms = first_frame->ReceivedTime();  // 接收時間
      // 檢查幀的渲染時間戳或者當前的目標延遲是否有異常，如果是則重置時間處理器，
      // 重新獲取幀的渲染時間.
      if (HasBadRenderTiming(*first_frame, now_ms)) {
        jitter_estimator_->Reset();
        timing_->Reset();
        render_time_ms =
            timing_->RenderTimeMs(first_frame->Timestamp(), now_ms);
      }
      // 遍歷所有待解碼超幀(他們應該有同樣的時間戳)
      for (FrameMap::iterator& frame_it : frames_to_decode_) {
        RTC_DCHECK(frame_it != frames_.end());
        EncodedFrame* frame = frame_it->second.frame.release();
		// 重置預期渲染時間.
        frame->SetRenderTime(render_time_ms);
        // 超幀是否經過了重傳.
        superframe_delayed_by_retransmission |=
            frame->delayed_by_retransmission();
        // 更新接收時間.
        receive_time_ms = std::max(receive_time_ms, frame->ReceivedTime());
        // 更新超幀總大小.
        superframe_size += frame->size();
		// 傳播可解碼性，當前幀可解碼，通知參考他的幀檢查其參考幀是否都已經被解碼，
		// 如果是則也可以進入可解碼狀態.
        PropagateDecodability(frame_it->second);
        // 當前可解碼幀進入已解碼幀歷史列表(實際上沒有真的被解碼，而是即將被解碼)，
        // 早於歷史解碼幀的幀將被丟棄.
        decoded_frames_history_.InsertDecoded(frame_it->first,
                                              frame->Timestamp());

        // 刪除幀緩存開始位置到當前解碼幀位置的所有幀(因爲已經沒有必要保存)
        frames_.erase(frames_.begin(), ++frame_it);
	    // 輸出幀.
        frames_out.push_back(frame);
      }
      // 如果沒有被重傳，則可以處理延遲.
      if (!superframe_delayed_by_retransmission) {
        int64_t frame_delay;
        // 到達時間濾波器計算幀間延遲.
        if (inter_frame_delay_.CalculateDelay(first_frame->Timestamp(),
                                              &frame_delay, receive_time_ms)) {
          // 卡爾曼濾波器計算抖動，輸入觀測幀間延遲，輸出最優幀間延遲，也就是抖動.
          jitter_estimator_->UpdateEstimate(frame_delay, superframe_size);
        }
        float rtt_mult = protection_mode_ == kProtectionNackFEC ? 0.0 : 1.0;
        if (RttMultExperiment::RttMultEnabled()) {
          rtt_mult = RttMultExperiment::GetRttMultValue();
        }
        // 獲取抖動，並設置到timing_中，如果是初始狀態，當前延遲(googCurrentDelayMs)被設置成抖動.
        timing_->SetJitterDelay(jitter_estimator_->GetJitterEstimate(rtt_mult));
        // 更新當前延遲(googCurrentDelayMs)，逼近googTargetDelayMs.
        timing_->UpdateCurrentDelay(render_time_ms, now_ms);
      } else {
        // 更新jitter_estimator_重傳的次數，會影響其獲取抖動的結果.
        if (RttMultExperiment::RttMultEnabled() || add_rtt_to_playout_delay_)
          jitter_estimator_->FrameNacked();
      }
      // 獲取詳細時間信息通知Observer.
      UpdateJitterDelay();
      UpdateTimingFrameInfo();
    }
    // 輸出待解碼幀.
    if (!frames_out.empty()) {
      if (frames_out.size() == 1) {
        frame_out->reset(frames_out[0]);
      } else {
        frame_out->reset(CombineAndDeleteFrames(frames_out));
      }
      return kFrameFound;
    }
  }  // rtc::Critscope lock(&crit_)
  //  如果還有剩餘時間還沒有獲得可解碼幀，可以再嘗試等一等.
  if (latest_return_time_ms - now_ms > 0) {
    // If |next_frame_it_ == frames_.end()| and there is still time left, it
    // means that the frame buffer was cleared as the thread in this function
    // was waiting to acquire |crit_| in order to return. Wait for the
    // remaining time and then return.
    return NextFrame(latest_return_time_ms - now_ms, frame_out);
  }
  return kTimeout;
}

6.4 狀態傳播 - FrameBuffer::PropagateContinuity/FrameBuffer::PropagateDecodability

進入FrameBuffer的幀都帶有參考幀的信息，FrameBuffer反向建立依賴表，在每個參考幀中填入依賴幀的信息，在參考幀進入連續狀態、可解碼狀態後可以直接進行通知。

連續性傳播：

void FrameBuffer::PropagateContinuity(FrameMap::iterator start) {
  std::queue<FrameMap::iterator> continuous_frames;
  // start是連續的，先入隊
  continuous_frames.push(start);

  // 廣度優先搜索傳播幀連續性.
  // 廣度優先搜索的基本方法：待處理數據入隊，數據出隊處理後獲得的中間數據再次入隊，
  // 迭代搜索直到處理完所有的數據，也就是迭代處理鄰接的節點，直到遍歷整張圖.
  while (!continuous_frames.empty()) {
    // 連續幀出隊.
    auto frame = continuous_frames.front();
    continuous_frames.pop();
    // 如果最新的連續幀還未設置，或者當前連續幀比之前的最新連續幀還新，那麼更新最新連續幀,
    // 用於NextFrame中限制遍歷幀緩存的邊界.
    if (!last_continuous_frame_ || *last_continuous_frame_ < frame->first) {
      last_continuous_frame_ = frame->first;
    }

    // 遍歷當前連續幀的所有依賴幀(依賴該連續幀的幀，這些幀的參考幀就是當前連續幀)
    for (size_t d = 0; d < frame->second.dependent_frames.size(); ++d) {
      // 檢查該依賴幀是否在幀緩存中
      auto frame_ref = frames_.find(frame->second.dependent_frames[d]);
      RTC_DCHECK(frame_ref != frames_.end());

      // 如果該依賴幀還在幀緩存中則檢查幀連續性，否則有可能退出廣度優先搜索.
      if (frame_ref != frames_.end()) {
        // 其未連續參考幀計數器--
        --frame_ref->second.num_missing_continuous;
        // 如果未連續參考幀計數器到0，說明所有參考幀都收到了.
        if (frame_ref->second.num_missing_continuous == 0) {
          // 該依賴幀也連續了.
          frame_ref->second.continuous = true;
          // 該依賴幀入隊，在下次迭代繼續搜索其依賴幀(參考他的幀)的連續性.
          continuous_frames.push(frame_ref);
        }
      }
    }
  }
}

可解碼性傳播：

void FrameBuffer::PropagateDecodability(const FrameInfo& info) {
  // 遍歷所有依賴幀.
  for (size_t d = 0; d < info.dependent_frames.size(); ++d) {
    // 檢查依賴幀是否還在幀緩存中.
    auto ref_info = frames_.find(info.dependent_frames[d]);
    RTC_DCHECK(ref_info != frames_.end());
    // TODO(philipel): Look into why we've seen this happen.
    if (ref_info != frames_.end()) {
      // 如果依賴幀還在幀緩存中，未解碼參考幀計數器--,
      // 一個幀只有在連續(num_missing_continuous==0),
      // 並且其所有參考幀已經被解碼(num_missing_decodable==0)的情況下，
      // 才能進入可解碼狀態(即將被解碼)，該狀態在解碼線程中調用NextFrame時設置，
      // 所以這裏不再使用廣度優先搜索傳播可解碼性，而只是遞減未解碼參考幀計數器.
      RTC_DCHECK_GT(ref_info->second.num_missing_decodable, 0U);
      --ref_info->second.num_missing_decodable;
    }
  }
}

6.6 總結

FrameBuffer緩存即將進入解碼器的幀，按照順序向解碼器輸出連續的、所有參考幀都已經被解碼的幀。

7 抖動與延遲

JitterBuffer包含Jitter與Buffer，上面幾節講了Buffer，主要用於緩存、排序、組幀、有序輸出，起到抗抖動的作用。但是網絡的具體抖動指標是多少，網絡的延遲是多少，需要其他的一些工具計算。

7.1 抖動計算

VCMInterFrameDelay：計算幀間延遲 = 兩幀的接收時間差 - 兩幀的發送時間差；
VCMJitterEstimator：通過VCMInterFrameDelay計算的幀間延遲計算出最優抖動值。

上圖描述了幀間延遲(抖動)觀測值的計算方法：jitter = tr_delta - ts_delta = (tr2 - tr1) - (ts2 - ts1)，也就是兩幀的接收時間差 - 兩幀的發送時間差。

計算最優抖動的算法和GCC中使用到達時間濾波器(InterArrival)計算到達時間增量、使用過載估計器(OveruseEstimator)計算最優的到達間隔增量的算法基本一樣，都是利用卡爾曼濾波器，綜合幀間延遲的觀測值、預測值，獲得最優的幀間延遲(也就是網絡抖動)，只是數據採樣的形式不太相同，GCC使用5ms的包簇(也可以稱爲幀)，這裏直接使用視頻幀，這裏不再詳述。

7.2 延遲 - VCMTiming

VCMTiming可以輸出接收端的以下參數，這些參數可以在使用瀏覽器拉流時在chrome://webrtc-internals頁面中看到。

名字	含義
googDecodeMs	最近一次解碼耗時.
googMaxDecodeMs	最大解碼耗時，實際上是第95百分位數，也就是大於採樣集合95%的解碼延遲.
googRenderDelayMs	渲染耗時，固定爲10ms.
googJitterBufferMs	網絡抖動，見上節.
googMinPlayoutDelayMs	最小播放時延，音視頻同步器輸出的視頻幀播放應該延遲的時長.
googTargetDelayMs	目標時延，googCurrentDelayMs會逼近目標延遲.
googCurrentDelayMs	當前時延，用於計算視頻幀渲染時間.

7.2.1 目標延遲 - googTargetDelayMs

int VCMTiming::TargetDelayInternal() const {
  return std::max(min_playout_delay_ms_,
                  jitter_delay_ms_ + RequiredDecodeTimeMs() + render_delay_ms_);
}

很明顯，目標延遲基本上就是抖動+解碼時間+渲染時間，與播放延遲的最大者，也就是播放當前幀總體的期望延遲，作爲當前延遲googCurrentDelayMs的參考值，並最終用於音視頻同步。

7.2.2 當前延遲 - googCurrentDelayMs

FrameBuffer每獲得一個可解碼幀會調用一次，更新當前延遲，最終用於計算渲染時間。

void VCMTiming::UpdateCurrentDelay(int64_t render_time_ms,
                                   int64_t actual_decode_time_ms) {
  rtc::CritScope cs(&crit_sect_);
  // 獲得目標延遲. 
  uint32_t target_delay_ms = TargetDelayInternal();
  // render_time_ms：期望渲染時間
  // 期望解碼時間 = 幀期望渲染時間 - 解碼耗時 - 渲染耗時
  // 實際產生的延遲delayed_ms = 實際解碼時間actual_decode_time_ms - 期望解碼時間
  int64_t delayed_ms =
      actual_decode_time_ms -
      (render_time_ms - RequiredDecodeTimeMs() - render_delay_ms_);
  // 如果沒有發生延遲，退出.
  if (delayed_ms < 0) {
    return;
  }
  // 如果有延遲，上個時刻的當前延遲 + 實際產生的延遲仍然<=目標延遲
  if (current_delay_ms_ + delayed_ms <= target_delay_ms) {
    // 更新當前延遲，逼近目標延遲.
    current_delay_ms_ += delayed_ms;
  } else {
    // 如果上個時刻的當前延遲 + 實際產生的延遲仍然超過目標延遲，以目標延遲爲上限.
    current_delay_ms_ = target_delay_ms;
  }
}

7.3 平滑渲染時間 - TimestampExtrapolator

FrameBuffer每獲得一個可解碼幀，都要更新其渲染時間，渲染時間通過TimestampExtrapolator類獲得。TimestampExtrapolator也是一個卡爾曼濾波器，其輸入爲輸入幀的時間戳，TimestampExtrapolator會根據輸入幀的時間戳的間隔計算輸出渲染時間，目標是平滑輸出幀的時間間隔。

視頻幀的最終渲染時間 = 幀平滑時間 + 當前延遲。

int64_t VCMTiming::RenderTimeMsInternal(uint32_t frame_timestamp,
                                        int64_t now_ms) const {
  // 如果這兩個播放延遲都是0，要求立刻渲染.
  if (min_playout_delay_ms_ == 0 && max_playout_delay_ms_ == 0) {
    // Render as soon as possible.
    return 0;
  }
  // 使用卡爾曼濾波器估算幀平滑時間.
  int64_t estimated_complete_time_ms =
      ts_extrapolator_->ExtrapolateLocalTime(frame_timestamp);
  if (estimated_complete_time_ms == -1) {
    estimated_complete_time_ms = now_ms;
  }

  // 當前延遲限定在(min_playout_delay_ms_, max_playout_delay_ms_)範圍內
  int actual_delay = std::max(current_delay_ms_, min_playout_delay_ms_);
  actual_delay = std::min(actual_delay, max_playout_delay_ms_);
  // 視頻幀的最終渲染時間 = 幀平滑時間 + 當前延遲
  return estimated_complete_time_ms + actual_delay;
}

8 總結

RTP包進入JitterBuffer後，最終輸出了完整、連續、可解碼的視頻幀，並攜帶了可用於最終播放的渲染時間。