cv論文筆記：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（無監督深度預測系列3：半監督方法）

原創

2020-06-27 11:10

一、基本信息

標題：Semi-Supervised Deep Learning for Monocular Depth Map Prediction
時間：2017
引用格式：Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 6647-6655.

二、研究背景

監督學習：需要大量標記數據，激光雷達RGBD等獲取的數據存在噪聲且稀疏，激光與照相機的投影中心不重合
無監督學習：對應沒有紋理的地方，預測不了

總結一下深度預測發展：

Saxena et al. 第一個基於監督學習方法，使用MRF，手動提取特徵
Eigen et al.使用CNN，由粗到細的多層網絡。筆記
Li et al.使用CNN結合CRFs超像素分割
Liu et al.端到端訓練一元勢和成對勢的CNN特徵，連續深度和高斯假設？？
Laina et al.使用ResNet構建深度卷積，得到預測密度更大
此後，圖像的深度轉移的思想[或者將深度圖預測與語義分割相結合
Garg et al. FCN FlowNet 使用光測誤差。（利用一階泰勒近似將損失線性化，因此需要從粗到細的訓練？？）
Xie et al. 視差方法，最小化像素級重建誤差。
Godard et al.也是視差方法，最小重建誤差，但是使用左右約束。筆記

三、創新點

本文提出使用監督和非監督結合的方法。一個訓練配對圖需要2張深度圖（LiDAR獲得），2張RGB圖。

令CNN預測的深度倒數 $\rho(\mathbf{x})$ 和激光雷達得到的深度 $Z(\mathbf{x})$ 對應關係：
$\rho(\mathbf{x})^{-1} \stackrel{!}{=} Z(\mathbf{x})$

圖像減去視差 $f b \rho(\mathbf{x})$ ：
$\omega(\mathbf{x}, \rho(\mathbf{x})):=\mathbf{x}-f b \rho(\mathbf{x})$

令左圖 $I_1$ 等於右圖 $I_2$ -視差：
$I_{1}(\mathbf{x}) \stackrel{!}{=} I_{2}(\omega(\mathbf{x}, \rho(\mathbf{x})))$

結合左右圖像：
$\begin{array}{c} I_{\text {left}}(\mathbf{x}) \stackrel{!}{=} I_{\text {right}}(\omega(\mathbf{x}, \rho(\mathbf{x}))) \\ I_{\text {right}}(\mathbf{x}) \stackrel{!}{=} I_{\text {left}}(\omega(\mathbf{x},-\rho(\mathbf{x}))) \end{array}$

損失函數

Supervised loss.

$\begin{aligned} \mathcal{L}_{\boldsymbol{\theta}}^{S}=\sum_{\mathbf{x} \in \Omega_{Z, l}}\left\|\rho_{l, \boldsymbol{\theta}}(\mathbf{x})^{-1}-Z_{l}(\mathbf{x})\right\|_{\delta} &+\sum_{\mathbf{x} \in \Omega_{Z, r}}\left\|\rho_{r, \boldsymbol{\theta}}(\mathbf{x})^{-1}-Z_{r}(\mathbf{x})\right\|_{\delta} \end{aligned}$

$\theta$ 是CNN參數那麼預測的深度倒數： $\rho_{r/l, \theta}$ ， $\|\cdot\|_{\delta}$ 是berHu範數，結合了L1和L2範數：
$\|d\|_{\delta}=\left\{\begin{array}{l}|d|, d \leq \delta \\ \frac{d^{2}+\delta^{2}}{2 \delta}, d>\delta\end{array}\right.$

$\delta=0.2 \max _{\mathbf{x} \in \Omega_{Z}}\left(\left|\rho(\mathbf{x})^{-1}-Z(\mathbf{x})\right|\right)$

Unsupervised loss.

$\begin{array}{c} \mathcal{L}_{\boldsymbol{\theta}}^{U}=\sum_{\mathbf{x} \in \Omega_{U, l}}\left|\left(\mathbf{G}_{\sigma} * I_{l}\right)(\mathbf{x})-\left(\mathbf{G}_{\sigma} * I_{r}\right)\left(\omega\left(\mathbf{x}, \rho_{l, \boldsymbol{\theta}}(\mathbf{x})\right)\right)\right| \\ +\sum_{\mathbf{x} \in \Omega_{U, r}}\left|\left(\mathbf{G}_{\sigma} * I_{r}\right)(\mathbf{x})-\left(\mathbf{G}_{\sigma} * I_{l}\right)\left(\omega\left(\mathbf{x},-\rho_{r, \boldsymbol{\theta}}(\mathbf{x})\right)\right)\right| \end{array}$

$\mathrm{G}_{\sigma}$ 是高斯核，模糊是爲了去噪，使用 $\sigma=1 \mathrm{px}$

Regularization loss.

$L_{\boldsymbol{\theta}}^{R}=\sum_{i \in\{l, r\}} \sum_{\mathbf{x} \in \Omega}\left|\phi\left(\nabla I_{i}(\mathbf{x})\right)^{\top} \nabla \rho_{i}(\mathbf{x})\right|$

$\phi(\mathbf{g})=\left(\exp \left(-\eta\left|g_{x}\right|\right), \exp \left(-\eta\left|g_{y}\right|\right)\right)^{\top}$

$\eta=\frac{1}{255}$
防止預測梯度太大作用，個人理解：當預測梯度 $\nabla \rho_{i}(\mathbf{x})$ 很大時，而真實梯度很小，導致 $\phi\left(\nabla I_{i}(\mathbf{x})\right)^{\top}$ 很大，所以 $L_{\boldsymbol{\theta}}^{R}$ 就很大。保持梯度一致性的意思。。。

總損失

$\begin{array}{l} \mathcal{L}_{\boldsymbol{\theta}}\left(I_{l}, I_{r}, Z_{l}, Z_{r}\right)= \quad \lambda_{t} \mathcal{L}_{\boldsymbol{\theta}}^{S}\left(I_{l}, I_{r}, Z_{l}, Z_{r}\right)+\gamma \mathcal{L}_{\boldsymbol{\theta}}^{U}\left(I_{l}, I_{r}\right)+\mathcal{L}_{\boldsymbol{\theta}}^{R}\left(I_{l}, I_{r}\right) \end{array}$
$\lambda_{t}$ 和 $\gamma$ 是權衡參數

網絡結構

用的殘差網絡Flownet

2種殘差塊：

上投影殘差塊：

具體網絡結構：

四、實驗結果

9就是系列2左右約束方法，然後看到本文方法可以結合真實深度預測得到比較精準結果，同時對於真實深度沒有掃描的地方，通過CNN進行學習。

五、結論與思考

作者結論

總結

本文在有深度標籤數據下是個結合CNN的方法，但是大多數情況是沒有深度。要是以後有深度相機集成到手機上，這個方法不失爲增強方法。

思考

參考

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

cv論文筆記：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（無監督深度預測系列3：半監督方法）

一、基本信息

二、研究背景

三、創新點

損失函數

Supervised loss.

Unsupervised loss.

Regularization loss.

總損失

網絡結構

四、實驗結果

五、結論與思考

作者結論

總結

思考

參考

cv論文筆記（動作識別1）：Convolutional Two-Stream Network Fusion for Video Action Recognition

數字圖像處理：自適應局部gamma校正

論文復現：Unsupervised Learning of Depth and Ego-Motion from Video（SfMLearner）

論文筆記：Unsupervised Learning of Depth and Ego-Motion from Video（無監督深度預測系列4：PoseCNN方法）

cv論文筆記：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（無監督深度預測系列3：半監督方法）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結