揭祕視頻千倍壓縮背後的技術原理之預測技術

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着5G的成熟和廣泛商用,帶寬已經越來越高,傳輸視頻變得更加容易。設備特別是移動設備算力的提升、存儲容量的提升,使得視頻技術的應用越來越廣泛,無論是流媒體、泛娛樂、實時通信,視頻都帶給了用戶更加豐富的體驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視頻相關的技術,特別是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"視頻壓縮","attrs":{}},{"type":"text","text":",因其專業性,深入開發的門檻較高。具體到視頻實時通信場景,視頻壓縮技術面臨更嚴峻的挑戰,因爲實時通信場景下,對","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"時延","attrs":{}},{"type":"text","text":"要求非常高,對","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"設備適配","attrs":{}},{"type":"text","text":"的要求也非常高,對","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"帶寬適應","attrs":{}},{"type":"text","text":"的要求也非常高,開發一款","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"滿足實時通信要求的編解碼器","attrs":{}},{"type":"text","text":",難度也很高。之前的文章中,我們已經在","attrs":{}},{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=https%3A//mp.weixin.qq.com/s%3F__biz%3DMzU1MjAxNjI0Ng%3D%3D%26mid%3D2247483931%26idx%3D1%26sn%3Dff0913acd5e2cf4b8cbe219dcd044a73%26chksm%3Dfb89c202ccfe4b1467361a3eb0029289d6859ad6a86a2ad49823eadb691c0cda5957a0d2039c%26scene%3D21%26token%3D1347141590%26lang%3Dzh_CN%23wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《深入淺出理解視頻編解碼技術》","attrs":{}}]},{"type":"text","text":"一文中簡要介紹了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"視頻編解碼基本框架","attrs":{}},{"type":"text","text":",今天我們將深入剖析其中的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"預測模塊","attrs":{}},{"type":"text","text":",便於大家更好地理解視頻編解碼技術。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"01","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"顏色空間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開始進入主題之前,先簡單看一下視頻是如何在計算機中進行表達的。視頻是由一系列圖片按照時間順序排列而成,每一張圖片爲一幀。每一幀可以理解爲一個二維矩陣,矩陣的每個元素爲一個像素。一個像素通常由三個顏色進行表達,例如用RGB顏色空間表示時,每一個像素由三個顏色分量組成。每一個顏色分量用1個字節來表達,其取值範圍就是0~255。編碼中常用的YUV格式與之類似,這裏不作展開。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8c/8cf15ac3ecb0bc5f4a525b380f99255d.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖一","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以1280x720@60fps的視頻序列爲例,十秒鐘的視頻有1280*720*3*60*10 = 1.6GB,如此大量的數據,無論是存儲還是傳輸,都面臨巨大的挑戰。視頻壓縮或者編碼的目的,也是爲了保證視頻質量的前提下,將視頻減小,以利於傳輸和存儲。同時,爲了能正確還原視頻,需要將其解碼。從最早的H.261開始,視頻編解碼的框架都採用了這一結構,如圖所示。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"主要的模塊分爲幀內/幀間預測、(反)變換、(反)量化、熵編碼、環內濾波。","attrs":{}},{"type":"text","text":"一幀視頻數據,首先被分割成一系列的方塊,按照從左到右從上到下的方式,逐個進行處理,最後得到碼流。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1f/1f4c9a65f53997ff58818a3cd74002b6.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖二","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"02","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"幀內預測","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視頻數據被劃分成方塊之後,相鄰的方塊的像素,以及方塊內的像素,顏色往往是逐漸變化的,他們之間有比較強的有相似性。這種相似性,就是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"空間冗餘","attrs":{}},{"type":"text","text":"。既然存在冗餘,就可以用更少的數據量來表達這樣的特徵。比如,先傳輸第一個像素的值,再傳輸第二個像素相對於第一個像素的變化值,這個變化值往往取值範圍變小了許多,原來要8個bit來表達的像素值,可能只需要少於8個bit就足夠了。同樣的道理,以像素塊爲基本單位,也可以進行類似的“差分”操作。我們從示例圖中,來更加直觀地感受一下這樣的相似性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7f/7f0566792d0782cb3fa829f3c3d82ddc.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖三","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖中所標出的兩個8x8的塊,其亮度分量(Y)沿着“左上到右下”的方向,具有連續性,變化不大。假如我們設計某種特定的“模式”,使其利用左邊的塊來“預測”右邊的塊,那麼“原始像素”減去“預測像素”就可以減少傳輸所需要的數據量,同時將該“模式”寫入最終的碼流,解碼器便可以利用左側的塊來“重建”右側的塊。極端一點講,假如左側的塊的像素值經過一定的運算可以完全和右側的塊相同,那麼編碼器只要用一個“模式”的代價,傳輸右側的塊。當然,視頻中的紋理多種多樣,單一的模式很難對所有的紋理都適用,因此標準中也設計了多種多樣的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"幀內預測模式","attrs":{}},{"type":"text","text":",以充分利用像素間的相關性,達到壓縮的目的。例如下圖 (From Vcodex)所示的H.264中9種幀內預測方向。以模式0(豎直預測)爲例,上方塊的每個像素值(重建)各複製一列,得到幀內預測值。其它各種模式也採用類似的方法,不過,生成預測值的方式稍有不同。有這麼多的模式,就產生了一個問題,對於一個塊而言,我們應該採用哪種模式來進行編碼呢?最佳的選擇方式,就是遍歷所有的模式進行嘗試,計算其編碼的所需的比特數和產生的質量損失,即","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"率失真優化","attrs":{}},{"type":"text","text":",這樣明顯非常複雜,因而也有很多種其它的方式來推斷哪種模式更好,例如基於SATD或者邊緣檢測等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從H.264的9種預測模式,到AV1的56種幀內方向預測模式,越來越多的模式也是爲了更加精準地預測未編碼的塊,但是模式的增加,一方面增加了傳輸模式的碼率開銷,另一方面,從如此重多的模式中選一個最優的模式來編碼,使其能達到更高的壓縮比,這對編碼器的設計和實現也提出了更高的要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5a/5af6c5b32f467fdeae4a667fd96b5aa8.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖四","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"03","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"幀間預測","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下5張圖片是一段視頻的前5幀,可以看出,圖片中只有Mario和磚塊在運動,其餘的場景大多是相似的,這種相似性就稱之爲","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"時間冗餘","attrs":{}},{"type":"text","text":"。編碼的時候,我們先將第一幀圖片通過前文所述的幀內預測方式進行編碼傳輸,再將後續幀的Mario、磚塊的運動方向進行傳輸,解碼的時候,就可以將運動信息和第一幀一起來合成後續的幀,這樣就大大減少了傳輸所需的bit數。這種利用時間冗餘來進行壓縮的技術,就是運動補償技術。該技術早在H.261標準中,就已經被採用。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aa4bfcec4bb26397ab05d42a1888d54b.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖五","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"細心地讀者可能已經發現,Mario和磚塊這樣的物體怎麼描述,才能讓它僅憑運動信息就能完整地呈現出來?其實視頻編碼中並不需要知道運動的物體的形狀,而是將整幀圖像劃分成像素塊,每個像素塊使用一個運動信息。即","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"基於塊的運動補償","attrs":{}},{"type":"text","text":"。下圖中紅色圈出的白色箭頭即編碼磚塊和Mario時的運動信息,它們都指向了前一幀中所在的位置。Mario和磚塊都有兩個箭頭,說明它們都被劃分在了兩個塊中,每一個塊都有單獨的運動信息。這些運動信息就是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"運動矢量","attrs":{}},{"type":"text","text":"。運動矢量有水平和豎直兩個分量,代表是的一個塊相對於其參考幀的位置變化。參考幀就是已經編碼過的某一(多)個幀。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a964f745b8f298f2d77300c3cbf7500a.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖六","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,傳輸運動矢量本身就要佔用很多 bit,爲了提高運動矢量的傳輸效率,一方面,可以儘可能得將塊劃分變大,共用一個運動矢量,因爲平坦區域或者較大的物體,他們的運動可能是比較一致的,從 H.264 開始,可變塊大小的運動補償技術被廣泛採用;另一方面,相鄰的塊之間的運動往往也有比較高的相似性,其運動矢量也有較高的相似性,運動矢量本身也可以根據相鄰的塊運動矢量來進行預測,即運動矢量預測技術;最後,運動矢量在表達物體運動的時候,有精度的取捨。像素是離散化的表達,現實中物體的運動顯然不是以像素爲單位進行運動的,爲了精確地表達物體的運動,需要選擇合適的精度來定義運動矢量。各視頻編解碼標準都定義了運動矢量的精度,運動矢量精度越高,越能精確地表達運動,但是代價就是傳輸運動矢量需要花費更多的bit。H.261中運動矢量是以整像素爲精度的,H.264中運動矢量是以四分之一像素爲精度的,AV1中還增加了八分之一精度。一般情況,時間上越近的幀,它們之間的相似性越高,也有例外,例如往復運動的場景等,可能相隔幾幀,甚至更遠的幀,會有更高的相似度。爲了充分利用已經編碼過的幀來提高運動補償的準確度,從H.264開始引入了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"多參考幀技術","attrs":{}},{"type":"text","text":",即,一個塊可以從已經編碼過的很多個參考幀中進行運動匹配,將匹配的幀索引和運動矢量信息都進行傳輸。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼如何得到一個塊的運動信息呢?最樸素的想法就是,將一個塊,在其參考幀中,逐個位置進行匹配檢查,匹配度最高的,就是最終的運動矢量。匹配度,常用的有SAD(Sum of Absolute Difference)、SSD(Sum of Squared Difference)等。逐個位置進行匹配度檢查,即常說的全搜索運動估計,其計算複雜度可想而知是非常高的。爲了加快運動估計,我們可以減少搜索的位置數,類似的有很多算法,常用的如鑽石搜索、六邊形搜索、非對稱十字型多層次六邊形格點搜索算法等。以鑽石搜索爲例,如圖所示,以起始的藍色點爲中心的9個匹配位置,分別計算這9個位置的SAD,如果SAD最小的是中心位置,下一步搜索中心點更近的周圍4個綠色點的SAD,選擇其中SAD最小的位置,繼續縮小範圍進行搜索;如果第一步中SAD最小的點不在中心,那麼以該位置爲中心,增加褐色的5或者3個點,繼續計算SAD,如此迭代,直到找到最佳匹配位置。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c3cefd2513a5813e89bd2a36e7c0e6b7.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖七","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編碼器在實現時,可根據實際的應用場景,對搜索算法進行選擇。例如,在實時通信場景下,計算複雜度是相對有限的,運動估計模塊要選擇計算量較小的算法,以平衡複雜度和編碼效率。當然,運動估計與運動補償的複雜度還與塊的大小,參考幀的個數,亞像素的計算等有關,在此不再深入展開。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"04","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文介紹的預測技術,充分利用了視頻信號空間上和時間上的相關性,通過多種設計精巧的預測模式,達到了去除冗餘的目的,這是視頻壓縮高達千倍比例的關鍵之一。縱觀視頻編解碼技術的發展歷史,預測模式越來越多,預測的精確度越來越高,帶來的壓縮比也越來越高。如何快速高效地使用這些預測模式,也必然成爲設計實現的重中之重,成爲H.265/H.266/AV1這些新標準發揮其高效壓縮性能的關鍵。","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"strong","attrs":{}}],"text":"關注拍樂雲Pano,我們將在後面的文章中爲大家分享《視頻編解碼系列》的更多技術乾貨。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片出處:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖一:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=https%3A//github.com/leandromoreira/digital_video_introduction/blob/master/i/image_3d_matrix_rgb.png","title":null,"type":null},"content":[{"type":"text","text":"https://github.com/leandromoreira/digital_video_introduction/blob/master/i/image_3d_matrix_rgb.png","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖四:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://link.zhihu.com/?target=https%3A//www.vcodex.com/h264avc-intra-precition/","title":null,"type":null},"content":[{"type":"text","text":"H.264/AVC Intra Prediction","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章