[paper]End-to-End Training of Hybrid CNN-CRF Models for Stereo

原創

zhwli

2020-02-24 08:13

Pre-learning

隱馬爾科夫模型
Y={y1,y2,...,yn} 是一組隨機變量，X={x1,x2,...,xn} 是其觀測變量，我們假設Y具有馬爾科夫性，則X，Y的聯合概率爲

P(x1,x2,...,xn,y1,y2,...,yn)=P(y1)P(x1|y1)∏i=2nP(yi|yi−1)P(xi|yi)

爲確定一個Hidden Markov Model，需要確定以下三組參數[A,B,π]
- 狀態轉移概率 A=[aij]NxN 其中 $a i j = P (y t + 1 = s j | y t = s i), 1 \leq i, j \leq N$
  表示t時刻狀態爲si , t+1時刻狀態爲 sj 的概率
- 輸出觀測概率 B
- 初始狀態概率 π
馬爾科夫隨機場
- 團，極大團
- 在馬爾科夫隨機場中，多個變量之間的聯合概率能基於團分解爲多個因子的乘積，每個因子僅與一個團相關。
  $P (X) = 1 Z \prod Q \in C ψ Q (X Q)$
  其中X={x1,x2,...,xn} 是n個隨機變量，其所有團構成集合C，與團Q∈C 對應的變量集合記爲 XQ
條件隨機場
馬爾科夫隨機場希望預測的是聯合概率P(X,Y,O)
而條件隨機場希望預測條件概率 P(Y|X,O)
馬爾科夫隨機場是生成模型，而條件隨機場是判別模型。

Notation_paper

Contribution

Proposed hybrid CNN+CRF model for stereo match
Proposed a sound model based on Structured Support Vector Machine to train the hybrid model end-to-end.
Using only shallow CNN and without post-processing, the model performs very well in benchmark.

這是我們用來做stere match的模型結構。首先對左圖中每個像素，我們用UnaryCNN 對其對應的image pairs (I0,I1) 【這裏是整張圖】進行特徵計算，然後用一個correlation層進行特徵比較（相似度/cost計算），由此產生cost volume 作爲CRF模型的unary(一元)損失項。而CRF的成對項(pairwise cost)損失則通過Contrast Sensitive model 或者Pair-wise CNN 進行計算。

Unary CNN

這裏用3-7層，每層100個filters的CNN網絡，對輸入圖片進行特徵計算，其中第一層的filter size爲3x3，其它層爲2x2。同時我們用tanh作爲激活函數，而不是用RELU，一方面tanh比較好訓練，不需要插入複雜的BN層，其次[1](patch matching for optical flow with thresholded hinge loss.), [2](Discriminative learning of local image descriptors.) 證明tanh比RELU更適合Patch Match的任務。

Correlation

這一步，我們通過以下公式計算分別從左右圖中獲取的特徵 ϕ0,ϕ1 的cross-correlation：

p i (k) = e < ϕ 0 i , ϕ 1 i + k > \sum j \in  e < ϕ 0 i , ϕ 1 i + j > \forall i \in Ω, \forall k \in 

{#eq: (3)}

其中 I0,I1 是左右整圖，i∈Ω=domI0 （dom = domain of function) 代表圖片I0 , 則ϕ0i,ϕ1i+k 分別是左圖i像素和右圖i+k像素對應的feature. xi∈={0...L−1} disparity的可能取值，也是Stereo match問題中，像素i對應的label。
這裏的 pi(k) 是由一個softmax 分類器計算得到，可以用作模型認爲i屬於label k的概率/置信度，或者說 I0 中以i爲中心的一個window和 I1 中以i+k爲中心的window的匹配度。

CRF

CRF model:

m i n x \in  (f (x) : = \sum i \in  f i (x i) + \sum i, j \in ε f i, j (x i, x j))

其中，

 是所有CRF圖模型中的節點，即所有像素的集合【和

Ω 的區別？？】，

 是所有邊的集合，

= 是標籤所在的空間（space of labelings）。
unary 損失項

fi:→ 爲我們之前計算的匹配度的取負

fi(k)=−pi(k) .
pair-wise 損失項則爲

f i, j (x i, x j) = ω i j ρ (| x i - x j |, P 1, P 2)

其中，

ωij 可以人爲設定（如下式）或者用學習得到的pair-wise CNN代替。

ω i j = e x p (- α | I i - I j | β), \forall i j \in 

ρ (| x i - x j |) = ⎧ ⎩ ⎨ ⎪ ⎪ 0, P 1, P 2, if | x i - x j | = 0, if | x i - x j | = 1, o t h e r w i s e

P1 用來懲罰平滑表面的微小的視差值變化，

P2 用來懲罰視差不連續區域較大的視差變化。我們只用4-connected grid 上的pairwise-interaction，

Inference
直接求解上面的CRF模型是非常困難的，但是我們可以用一些算法求近似解。

Let f denote the concatenated cost vector of fi and fi,j . Then we decompose f into horizontal and vertical chains: f=f1+f2 , where f1 includes all horizontal edges and all unary terms, f2 all vertical edges and zeros unary terms.
The DUAL_MM of (#eq:3)

max λ (D (λ) : = min x 1 (f 1 + λ) (x 1) + min x 2 (f 2 - λ) (x 2))

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

[paper]End-to-End Training of Hybrid CNN-CRF Models for Stereo

Pre-learning

Notation_paper

Contribution

Unary CNN

Correlation

CRF

【物體檢測】R-CNN家族

Multi-task Learning(未完成)

Postprocessing of stereo vision

【提升（Boosting）方法】

[Stereo_cnn][cvpr16]Efficient Deep Learning for Stereo Matching(未完成)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結