全民K歌客觀清晰度評估算法技術分享

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們討論視頻清晰度時,我們在討論什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多時候清晰度會被等同於視頻分辨率和碼流等等,在PGC時代也確如此,電影、電視劇、新聞媒體等都是通過專業設備錄製剪輯和壓縮,製作精良的源視頻能夠代表最高的清晰度,下采樣降低分辨率和增大QP壓低碼流等操作都會丟失有效信息,導致視頻清晰度變差。此類場景下我們能夠通過峯值信噪比(PSNR)和基於人眼視覺特徵的SSIM等評價準則來測量用戶接受視頻的主觀質量,與源視頻越相近則清晰度越高。然而在UGC時代用戶多樣化的視頻錄製設備和參差不齊的專業水平,無法繼續提供有效的高質量源視頻作爲參考,用戶看的視頻既可能是由於壓縮傳輸質量變差,也可能是由於低光照噪聲和抖動等視頻採集問題,這時只能根據Human Visual System(HVS)的日常觀察經驗來判定視頻的清晰程度。另外,比如遊戲視頻和屏幕分享等媒體內容的日趨多樣化,也使通用且有效的視頻清晰度評估算法發展變得更加困難。本文主要分享多媒體實驗室和全民K歌團隊合作開發的針對細分主播場景定製的無參考清晰度評估算法,主要介紹我們如何在細分場景獲取有效標註數據、模型訓練和模型部署之後的數據上報彙總分析的細節內容:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區別於常見CV標註的主觀打分數據集構建細節"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"清晰度算法着重解決的問題及結果分析"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對低質量視頻的討論分析"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客觀無參考質量評估算法效果展示,如下爲算法對最近採集上報視頻的預測分數以及視頻碼率和分辨率(由於最多上傳三個視頻,故轉碼 gif 格式呈現,可能有一定質量變化):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1d\/e5\/1df171fd6cba745ab8d288c3773ecfe5.gif","alt":null,"title":"【清晰度評分:84.91 bitrate: 2014 kb\/s resolution: 720x1280】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/eb\/f4\/eb7e71c0cyye1ebf6d6e5627faf2eff4.gif","alt":null,"title":"【清晰度評分:40.74  bitrate: 1906 kb\/s resolution: 720x1280】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/03\/6d\/03bbe9d579b3dcfc77694b0af2f8606d.gif","alt":null,"title":"【清晰度評分:90.89  bitrate: 9096 kb\/s resolution: 720x1280】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3c\/8b\/3c19e3a835550a007c7384059b6b428b.gif","alt":null,"title":"【清晰度評分:55.20  bitrate: 2019 kb\/s resolution: 720x1280】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、數據集構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近些年來,比較流行的基於rank learning的質量評估算法在很多公開的IQA數據集上(比如TID2013、LIVE challenge和KonIQ-10K)都有比較明顯的指標提升,rank learning的無監督思路在一定程度上可以通過人爲產生的數據對來緩解對主觀標註的數據的依賴,進而在目前相對其他CV任務體積較小的主觀數據集上取得不錯的性能提升,但是依然很難有效解決訓練樣本與實際應用場景樣本分佈差異的gap,針對細分場景的足夠大小的數據集依然是質量評估算法落地的不可或缺的步驟。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通常的,CV任務標註目標是如物體類別、位置和mask等客體信息,少量的熟練的標註人員即可滿足標註的需求,但是QA相關的數據集捕捉的是廣泛的人羣對同一個媒體內容的平均主觀評價信息,屬於用戶主體信息,必須有效地收集足夠數量的被測者的評價結果,過濾掉個人的偏好等,從而獲取接近真實大衆評價,即所謂的mean opinion score (MOS)。目前主觀數據集的構建一般是參考ITU recommendations。廣泛使用的TID-2013數據集,共有3000張失真圖片,共計有非重複的971人在實驗室環境下參與一共524,340次打分,採用的是pair-wise的打分方式,平均每張圖片被約170人打分,從這些數字我們可以看到主觀數據集的構建是非常的耗時耗力,這也是現有主觀數據集的大小受限的主要原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在足夠的資金支持下,我們可以選擇通過衆包方式來完成主觀打分,但將任務直接分發給公司外部的衆包測試會遇到很多不穩定的因素,帶來過多的噪聲信息,無法控制每個session之間的休息間隔和用戶多次重複參與打分等現象。爲此我們從2018年就開始搭建了包含視頻和音頻的主觀測試平臺,通過建立防水牆和白名單等模式,經過了長時間的不同任務訓練篩選過濾,屏蔽掉很多非法刷任務的薅羊毛黨,逐步保留下來了一批質量相對穩定的用戶定期參與我們的主觀打分任務,這使得我們通過通過衆包的方式能獲得相對可靠的主觀打分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體的數據集構建信息:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"視頻內容:2595條視頻,長度5s,2135條來自K歌,460條來自微視"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打分方式:三分類 (好中壞三擋)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參與人數:134個獨立用戶"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打分人次:30組x100條視頻x60人次=(2595+405) * 60=180,000,其中405條冗餘視頻用作一致性校驗"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有效比例:85.3%,根據打分偏向,一致性和outlier檢測,共計剔除264組無效打分"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打分方式:相對於常用的五分類打分,爲了降低衆包打分的複雜度,我們選用了更簡單的三分類打分方式,可以一定程度避免混淆,也方便後續的埋點數據校驗,易於數據的清洗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Ksong Dataset部分視頻展示,最後一行爲來自微視的視頻源:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/87\/5a\/87425b107bf043664521a580b442565a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據分佈問題:K歌直播場景爲了保障產品的流暢度,相比於短視頻等場景在在碼率方面有所犧牲,一定程度上限制了高質量視頻的比例,所以我們通過人臉檢測+人工二次分類篩選出和K歌場景類似的約460條碼率較高的微視短視頻片段,其中大部分視頻質量較高,從而使得數據源的分佈更加均衡。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶打分raw data,打分選項1,2,3對應低,中,高質量:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/80\/24\/809bc95a8b2225629163cc47ae10ac24.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據清洗:雖然參與打分的志願者整體質量有所保障,但仍需對所有打分做數據清洗。用戶參與的每個session包含共計100條視頻,主觀測試預估時長約15mins,每組視頻中有隨機的約13條視頻會重複出現兩次。如果用戶對這些埋點的anchor視頻判定分數差距過大,比如同一個視頻兩次打分爲3和1,則該用戶在此session所有的數據都會被視作無效數據丟棄。如果用戶過多的給出接近全部是同一分數的打分,比如80%打分均爲2,則也視爲無效數據。另外,每組視頻都獲取到約60人次的打分,如果單一用戶打分與平均打分偏差過大,則會被視作outlier丟棄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c5\/d8\/c514cf7053b9bb6824fd9e56628a58d8.png","alt":null,"title":"【數據集主觀打分分佈圖】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終獲得的數據集的MOS打分分佈如上圖所示,可以看出大部分的視頻偏向於低質量的區間[1,2],也再次驗證了加入碼率更高的視頻源的必要性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、算法及分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的視頻清晰度評估算法使用離散的視頻幀作爲輸入,不考慮額外的時域信息,算法針對前處理、模型和訓練方式的整體改進可以簡要概括爲三點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Larger input:更接近720p的672x448輸入尺寸"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hyper-column結構:擡升low-level特徵對質量預測的影響"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合Rank-learning:利用rank order強化學習效果"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用不同的縮放函數的效果對比:降採樣4倍之後使用nearest neighbor上採樣:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/aa\/08\/aa76ff0015228ba302959e3eeae27b08.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Larger input:傳統的質量評估方法如SSIM和BRISQUE等僅依賴於low-level特徵即可達到不錯的性能,這些算法除非是multi-scale情況下,通常是使用原始分辨率作爲輸入。CNN常用的輸入縮放尺寸如224等會大大降低圖像原有的信息量,導致後續的算法性能降低。經過部分實驗對比,通過全卷積方式FCN輸入更大的圖像尺寸,能夠明顯地提升預測效果。考慮應用場景視頻的aspect ratio大都爲16:9和4:3等,同時爲了避免非均勻縮放拉伸帶來的干擾 ,我們採用了(224x3)x(224x2)=672x448的輸入尺寸來更充分得利用有限的輸入尺寸。如下圖所示,視頻的輸入幀經過縮放之後填充至642x448的尺寸,保持aspect ratio的情況下輸入的長或者寬縮放至642或448,剩餘部分使用zero-padding黑邊;如果輸入爲橫屏模式,則對視頻幀做90度翻轉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5d\/ce\/5d12cc6777abaf7505540afcea4764ce.png","alt":null,"title":"【核心網絡結構】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hyper-column結構:如上圖的特徵提取模塊所示,我們借鑑語義分割中的hyper-column結構,對每個block的最後一層做global avg pooling,分別提取不同level的特徵向量並將所有block的的特徵拼接在一起通過FC layer進行最終的評分預測。相比於直接在使用最後一層layer信息,hyper-column對於語義分割可以提供更多的local細節位置信息,而對於質量評估則提供了更多的比如梯度變化等low-level圖像失真信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Rank-learning: 除了常用的用於迴歸訓練的L1 loss,我們也結合使用hinged ranking loss來通過不同視頻之間的分數差來強化視頻order的學習效果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"L1 Loss: L_reg = sum|mos - pred|"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hinged rank loss: L_rank = sum(max(0, thres-(mos_a – mos_b)*(pred_a-pred_b) ))"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Overall Loss: L = L_rank + lambda*(sum(|mos_a-pred_a|+|mos_b-pred_b|))"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型訓練方式分爲兩步:首先在公開數據集KonIQ-10K進行預訓練,之後在KsongDataset進行進一步調優訓練,兩次訓練採用相同的參數如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"{ \n \"arch\":\"resnet18 or other backbones\", \n \"epochs\":100,\n \"batch_size\":256, \n \"opt_id\": Adam, \n \"lr\":1e-4, \n \"loss_type\": reg+rank, \n \"workers\":24, \n \"shuffle\":1, \n \"fixsize\":[672,448]\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ba\/7c\/ba45f53953b0ba8c1e770b81d6aced7c.png","alt":null,"title":"【KonIQ-10K: comparison with SOTA methods】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在KonIQ-10K數據集上,BIQI, BRISEQUE, DIIVINE和HOSA爲使用傳統特徵的評估算法,指標最好的HOSA的PLCC和SRCC均在0.8左右("},{"type":"text","marks":[{"type":"italic"}],"text":"PLCC和SRCC區間爲-1到1,越接近約1則正相關性越強,-1爲負相關,0爲相關性最弱;PLCC關注線性一致性,SRCC關注單調性"},{"type":"text","text":");和近期的DIQA和Learn-from-rank IQA等CNN based方法對比,我們的算法的預測效果與state-of-the-art的指標不相上下,SRCC和PLCC指標均在0.9以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/96\/ba\/96d71fda9c76c863de248e07dcb91aba.png","alt":null,"title":"【Performance on KsongDataset using different backbone CNN models】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hyper-resnet18在KsongDataset的scatter plot,散點分佈越貼閤中心實線,算法與人眼主觀預測的相關性越好:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/46\/cf\/468b0c3f48677186b6ac664fd13caacf.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從使用不同backbone models結果對比來看,基於ResNet50\/ResNext50的算法在KSongDataset上的PLCC\/SRCC指標相對較高,但是考慮到ResNet18指標與之很接近,但模型更小且前饋速度更快,所以目前我們主要使用的是基於ResNet18的質量評估算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、討論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前我們算法已經集成在QUAlity Standalone Interface (QUASI) SDK,對K歌每日線上視頻片段進行掃描,經過從2020年1月份至今約3個月左右的線上監測,我們進一步驗證了算法對清晰度評估的可靠性,同時也收集到一批低質量的視頻序列。作爲無參考視頻清晰度閉環反饋的關鍵步驟,我們針對性分析了低質量視頻的產生原因,迭代了低質量原因分析算法,從而進一步提升K歌直播視頻的整體質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020-04-13 低質量視頻片段,常出現強噪聲、過曝光等明顯失真類型:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/eb\/8f\/eb346yy526a147628fd4abdd9047c88f.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a7\/58\/a7115ba4cba1532d75a9fe4822b89458.gif","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"低質量視頻分析:我們首先分析了頭部用戶和尾部用戶的質量是否和設備網絡等硬件設備有明顯相關性。如下圖所示,我們採集了約450個主播的網絡和設備信息,發現網絡類型主要是3G、4G和WIFI,從分佈比例上頭尾主播沒有明顯的差異。用戶機型方面,我們主要分爲三類:>=iPhone10的機型,<=iPhone8的機型、和android機型。由於android型號複雜且收集的型號不全,我們更側重iPhone機型用戶的對比,如下圖所示,我們發現頭部用戶使用相對老舊的“<=iPhone8的機型”的主播相比尾部用戶的比例更少。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3e\/39\/3e6a34ef12f03f2cfe8864848d934839.png","alt":null,"title":"【主播的網絡和設備信息】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於全民K歌主播大多使用手機的前置攝像頭來採集直播畫面,我們來看下iPhone不同機型的設備細節:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"iPhone7 front camera:7 MP, f\/2.2 aperture, 1080p HD video recording"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"iPhone8 front camera: 7 MP, f\/2.2 aperture, 1080p HD video recording"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"iPhone 11 Pro max: 12 MP, ƒ\/2.2 aperture, 1080p HD at 30 or 60 FPS"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/62\/db\/627b14b64269057238f7b50eea3e19db.png","alt":null,"title":"【DXOMARK – selfie 測評】","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"硬件方面iPhone11相對於iPhone7的提升主要是在分辨率從7MP提升至12MP,其他字面硬件細節區別不大,但根據專業測評網站的不完全的測評打分(iPhone X, Xs 11 Pro)我們可以看到,每年的iPhone更新換代對前置攝像頭的錄製效果也是有比較可觀的提升的。這樣我們可以初步得出一個結論:頭部較多比例的主播使用了相對較新的硬件設備,對視頻錄製的效果提升有一定的幫助。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然我們是無法保障每個用戶都是用最好的設備,但是我們依然是可以採用不同的措施來輔助改善用戶的視頻清晰度的。手機平板等前置攝像頭相對於後置攝像頭自身感光CMOS元器件面積過小,鏡頭入光量也有限,導致在常見的室內\/夜晚等直播環境下對錄製光照環境比較敏感,視頻質量容易受到環境光的干擾,很容易在弱\/強光源下導致欠\/過曝光的現象,如背景光過弱導致整體畫質變差、背景燈光直射導致局部過曝光和光線不均。在弱光環境下,相機ISO調高也會導致非常明顯的白噪聲類型失真。所以根據收集到的低質量視頻,我們調整測試開發了相應的基於low-level特徵的過曝光和噪點檢測的等算法,可以實時監測用戶直播的周遭光照環境,可以試試提供:如調整室內燈光、增加補光設備和攝像頭角度調整等建議和調整策略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"碼率&質量:另外,我們也可以看下清晰度算法對一些高質量視頻在不同轉碼碼率的質量變化趨勢預測效果。我們將上圖所示高質量demo視頻轉碼至500-8000kb\/s之間的不同碼率,可以看到視頻在2000kb\/s的碼率以上時,視頻質量的降低較爲緩慢,但當碼率低於2000kb\/s之後視頻質量就開始出現比較明顯的下降趨勢。感興趣的同學可以在附件中對比觀看不同碼率的視頻的質量,進一步確認效果(附件鏈接:"},{"type":"link","attrs":{"href":"https:\/\/share.weiyun.com\/gipQeGlS","title":"","type":null},"content":[{"type":"text","text":"https:\/\/share.weiyun.com\/gipQeGlS"}]},{"type":"text","text":")."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"demo視頻在500-8000kb\/s的碼率區間的無參考清晰度打分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/75\/9b\/75831918yy87bf988f5b6c972c5b7d9b.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要介紹了我們針對細分的主播場景的清晰度算法的開發過程,包含數據集構建、算法細節和算法上線後的一些反饋分析討論。在未來的工作中,我們會將清晰度算法的應用擴展到更多的如遊戲和視頻會議等應用場景,歡迎有需求的小夥伴一起合作開發新的算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PS:用個人見解來回應導語的問題,視頻的清晰度,是以人眼爲標杆視頻對捕捉場景的還原度,不存在分辨率和碼流越高清晰度越高的必然關係,不論是建築、風光、人像等自然圖像,或者屏幕錄製、CG動畫、遊戲等非自然圖像,我們都是以日常觀察作爲先驗知識去感受視頻是否清晰,視頻的捕獲、編碼、傳輸等處理如源場景和人眼之間的一層層玻璃,每個階段的質量干擾均會降低終端用戶感知的清晰度。UGC用戶多變的視頻錄製場景、隨視頻會議普及的非自然屏幕內容(Screen Content Coding)以及5G+雲遊戲的多樣化的遊戲場景不僅對視頻編碼傳輸等有着更高的要求,也更需要適用性更廣的無參考質量評估算法來輔助提升優化視頻的用戶體驗。"}]},{"type":"horizontalrule"},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"頭圖"},{"type":"text","text":":Unsplash"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者"},{"type":"text","text":":張亞彬"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文"},{"type":"text","text":":"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/VhS-jnJTVo5-4cqoMcMXiw","title":"","type":null},"content":[{"type":"text","text":"全民K歌客觀清晰度評估算法技術分享"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"來源"},{"type":"text","text":":騰訊多媒體實驗室 - 微信公衆號 [ID:TencentAVLab]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"轉載"},{"type":"text","text":":著作權歸作者所有。商業轉載請聯繫作者獲得授權,非商業轉載請註明出處。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章