【短道速滑六】古老的視頻去噪算法（FLT_GradualNoise）解析並優化，可實現1920*1080 YUV數據400fps的處理能力。

　　這個好像沒有啥對應的論文可以找到，在百度上搜索也能找到一些相關的資料，不過就直接是代碼，可以看到其實來自於一個叫做DScaler的項目，在github上目前還能找到該項目的完整資料。

　　詳見：https://github.com/JohnAdders/DScaler/tree/f7d92b76678e24422c48d4a956c0486ee042786d

　　其中含有FLT_GradualNoise.c文件，我們複製以下代碼的註釋部分對算法的解釋：

This algorithm is very similar to what Andrew Dowsey came up with in his "Adaptive Temporal Averaging" for his DirectShow filter. The algorithms differ in 1) their block size, 2) their motion estimation(sum of absolute differences versus mean squared error), 3) The addition of a "high tail," in which areas which have changed a lot(but not too much) still cause a small amount of averaging with the previous rame, and 4) rounding.

The algorithm :

This filter gets the sum of absolute differences between a four pixel horizontal block in the current image and the same block in the preceding frame.This isn't the best local motion measure, but it's very fast due to the psadbw SSE instruction.

This difference measure is used to determine the kind of averaging which will be conducted.If it's more than the "noise reduction" parameter, motion is inferred.In that case, we just use the new pixel values.If it's less than the noise reduction, we use the ratio of(difference / noise reduction) to determine the weighting of the old and new values.

Somewhat more formally :

N = Sum_block(| oldByte - newByte | )

R = Noise Reduction parameter

M = (motion evidence) = 1 if N / R >= 1.2

0.999 if 1.2 > N / R >= 1

N / R otherwise

Result pixel = (bytewise)oldPixel * (1 - M) + newPixel * M

Rounding has a very significant effect on the algorithm.In general, for computational reasons, values are rounded down.An important exception occurs when M > 0 and oldPixel != newPixel

but

oldPixel * (1 - M) + newPixel * M

rounds to oldPixel.In that case, the Result pixel is rounded to one toward the newPixel value.This makes sure that very gradual variation is maintained.

針對這個算法，作者提供了相關的彙編代碼，而且進行了非常詳細的註釋，但是這個彙編還不是普通的彙編，而是用的SIMD指令，因此，對於閱讀來說就非常的困難了，我大概花了10天左右，理解其思路，並用更加容易東的Intrinsic進行了重寫和優化。下面是一些編寫時的疑惑和解讀，共享下。

//    疑點1： 對於YUV數據，這個程序是如何處理的？ 
//    答覆：    從原始的彙編代碼看，他對YUV分量是同步處理的，並沒有做特別的區分，前面說的四個像素，指的意思就是Y0 U0 Y1 V0 Y2 U1 Y3 V1這4個像素，不管是MMX指令還是SSE指令
//            他們的psadbw指令都是一次性執行八個字節數據的絕對值累加（SSE指令一次性執行2個8個字節的累加而已）。如果把這個算法換成RGB格式的數據，那範圍要麻煩了，要拆分RGB到各個獨立的分量了。
//    疑點2： 上面提及默認的Rounding是向下的，但是一般要求只要Src和Prev有差異，就至少要向新像素有1個像素的偏移，以保證視頻的連續性，如何實現的。
//    答覆：    程序裏對數據進行了判斷，如果Src和Prev不同，則設置偏移量至少是1（正1和負1都可以）,相同的話偏移量當然爲0了。
//            另外，如果定點化後的偏移量大於65535，則設置偏移量爲AbsDiff值，因爲這個時候的由於程序移位計算的原因，直接算的值還會少1的。 (X * 65535) >> 16結果會爲X - 1
//    疑點3： 程序是如何進行優化的？
//    答覆：  (1) 在原始的代碼中，有這個0.999 if 1.2 > N / R >= 1，在作者提供的彙編代碼中，對這部分做了處理，他是通過一些比較和移位來實現的，把NoiseMultiplier更改爲65534了（N/R>=1,就已經設爲65535了)
//            在本代碼中，個人覺得這個判斷毫無必要，0.999對結果的影響太小了，因此捨棄了，在作者提供的SSE和MMX代碼中，這個也捨棄了。
//            (2) 定點化，程序中N/R涉及到除法運算，爲了減少這個，我們將整體擴大65536倍，然後再乘以AbsDiff，這個時候需要除以65536，這樣可以利用_mm_mulhi_epu16來快速實現（不需要特別的移位指令了，也不需要轉換到32位）
//            但是實際上，這裏是有誤差的，因爲這個函數不能做到四捨五入，建議使用_mm_mulhrs_epi16代替。同時注意如果N/R * 65536如果大於65535了，就對於了原始算式中的M=1了 ,這個時候就把他直接限定爲65535了（不需要轉換到32位了）
//            舉個例子，如果AbsDiff_Sum = 24，NoiseValue取值64，此時Multiplier的值爲1024，則如果某個像素的newPixel - oldPixel = 10，則結果爲 (24 * 1024 * 10) >> 16 = 3,但是實際的浮點爲3.75，理論上應該取4更爲合適。
//            (3) oldPixel * (1 - M) + newPixel * M經過整理可以變爲  oldPixel + (newPixel - oldPixel) * M, 此時配合newPixel - oldPixel的符號特性，可以使用_mm_adds_epu8和_mm_subs_epu8來實現最後的結果計算

　　總的來說這個算法，還是利用歷史幀的數據不斷的來平均誤差，減少視頻的噪音的，但是其可以充分利用快速計算8個字節數據的累加值的指令_mm_sad_epu8，可以達到非常恐怖的計算效率和速度。

　　測試1280*720大小視頻，去噪平均一幀約0.8ms，1920*1080視頻一幀需要約1.8ms（均位YUV422格式視頻）。

由於這裏上傳不了視頻，有需要了解該算法效果的，可以單獨聯繫我，我可以提供個測試DEMO（DEMO太大，無法上傳），下面截兩張圖可以稍微看到區別。

【短道速滑六】古老的視頻去噪算法（FLT_GradualNoise）解析並優化，可實現1920*1080 YUV數據400fps的處理能力。

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

【工程應用九】再談基於離散夾角餘弦相似度指標的形狀匹配優化（十六角度量化+指令集加速+目標只有部分在圖像內的識別+最小外接矩形識別重疊等）

【快速閱讀五】VS2019自帶的增強型指令集和自我優化的版本速度比較.

【快速閱讀四】基於邊緣信息的模版匹配中貪婪度參數的簡單解析。

【快速閱讀三】使用泊松融合實現單幅圖的無縫拼貼及消除兩幅圖片直接的拼接縫隙。

【快速閱讀二】從OpenCv的代碼中扣取泊松融合算子（Poisson Image Editing）並稍作優化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結