cv论文笔记：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（无监督深度预测系列3：半监督方法）

原創

2020-06-27 11:10

一、基本信息

标题：Semi-Supervised Deep Learning for Monocular Depth Map Prediction
时间：2017
引用格式：Kuznietsov Y, Stuckler J, Leibe B. Semi-supervised deep learning for monocular depth map prediction[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 6647-6655.

二、研究背景

监督学习：需要大量标记数据，激光雷达RGBD等获取的数据存在噪声且稀疏，激光与照相机的投影中心不重合
无监督学习：对应没有纹理的地方，预测不了

总结一下深度预测发展：

Saxena et al. 第一个基于监督学习方法，使用MRF，手动提取特征
Eigen et al.使用CNN，由粗到细的多层网络。笔记
Li et al.使用CNN结合CRFs超像素分割
Liu et al.端到端训练一元势和成对势的CNN特征，连续深度和高斯假设？？
Laina et al.使用ResNet构建深度卷积，得到预测密度更大
此后，图像的深度转移的思想[或者将深度图预测与语义分割相结合
Garg et al. FCN FlowNet 使用光测误差。（利用一阶泰勒近似将损失线性化，因此需要从粗到细的训练？？）
Xie et al. 视差方法，最小化像素级重建误差。
Godard et al.也是视差方法，最小重建误差，但是使用左右约束。笔记

三、创新点

本文提出使用监督和非监督结合的方法。一个训练配对图需要2张深度图（LiDAR获得），2张RGB图。

令CNN预测的深度倒数 $\rho(\mathbf{x})$ 和激光雷达得到的深度 $Z(\mathbf{x})$ 对应关系：
$\rho(\mathbf{x})^{-1} \stackrel{!}{=} Z(\mathbf{x})$

图像减去视差 $f b \rho(\mathbf{x})$ ：
$\omega(\mathbf{x}, \rho(\mathbf{x})):=\mathbf{x}-f b \rho(\mathbf{x})$

令左图 $I_1$ 等于右图 $I_2$ -视差：
$I_{1}(\mathbf{x}) \stackrel{!}{=} I_{2}(\omega(\mathbf{x}, \rho(\mathbf{x})))$

结合左右图像：
$\begin{array}{c} I_{\text {left}}(\mathbf{x}) \stackrel{!}{=} I_{\text {right}}(\omega(\mathbf{x}, \rho(\mathbf{x}))) \\ I_{\text {right}}(\mathbf{x}) \stackrel{!}{=} I_{\text {left}}(\omega(\mathbf{x},-\rho(\mathbf{x}))) \end{array}$

损失函数

Supervised loss.

$\begin{aligned} \mathcal{L}_{\boldsymbol{\theta}}^{S}=\sum_{\mathbf{x} \in \Omega_{Z, l}}\left\|\rho_{l, \boldsymbol{\theta}}(\mathbf{x})^{-1}-Z_{l}(\mathbf{x})\right\|_{\delta} &+\sum_{\mathbf{x} \in \Omega_{Z, r}}\left\|\rho_{r, \boldsymbol{\theta}}(\mathbf{x})^{-1}-Z_{r}(\mathbf{x})\right\|_{\delta} \end{aligned}$

$\theta$ 是CNN参数那么预测的深度倒数： $\rho_{r/l, \theta}$ ， $\|\cdot\|_{\delta}$ 是berHu范数，结合了L1和L2范数：
$\|d\|_{\delta}=\left\{\begin{array}{l}|d|, d \leq \delta \\ \frac{d^{2}+\delta^{2}}{2 \delta}, d>\delta\end{array}\right.$

$\delta=0.2 \max _{\mathbf{x} \in \Omega_{Z}}\left(\left|\rho(\mathbf{x})^{-1}-Z(\mathbf{x})\right|\right)$

Unsupervised loss.

$\begin{array}{c} \mathcal{L}_{\boldsymbol{\theta}}^{U}=\sum_{\mathbf{x} \in \Omega_{U, l}}\left|\left(\mathbf{G}_{\sigma} * I_{l}\right)(\mathbf{x})-\left(\mathbf{G}_{\sigma} * I_{r}\right)\left(\omega\left(\mathbf{x}, \rho_{l, \boldsymbol{\theta}}(\mathbf{x})\right)\right)\right| \\ +\sum_{\mathbf{x} \in \Omega_{U, r}}\left|\left(\mathbf{G}_{\sigma} * I_{r}\right)(\mathbf{x})-\left(\mathbf{G}_{\sigma} * I_{l}\right)\left(\omega\left(\mathbf{x},-\rho_{r, \boldsymbol{\theta}}(\mathbf{x})\right)\right)\right| \end{array}$

$\mathrm{G}_{\sigma}$ 是高斯核，模糊是为了去噪，使用 $\sigma=1 \mathrm{px}$

Regularization loss.

$L_{\boldsymbol{\theta}}^{R}=\sum_{i \in\{l, r\}} \sum_{\mathbf{x} \in \Omega}\left|\phi\left(\nabla I_{i}(\mathbf{x})\right)^{\top} \nabla \rho_{i}(\mathbf{x})\right|$

$\phi(\mathbf{g})=\left(\exp \left(-\eta\left|g_{x}\right|\right), \exp \left(-\eta\left|g_{y}\right|\right)\right)^{\top}$

$\eta=\frac{1}{255}$
防止预测梯度太大作用，个人理解：当预测梯度 $\nabla \rho_{i}(\mathbf{x})$ 很大时，而真实梯度很小，导致 $\phi\left(\nabla I_{i}(\mathbf{x})\right)^{\top}$ 很大，所以 $L_{\boldsymbol{\theta}}^{R}$ 就很大。保持梯度一致性的意思。。。

总损失

$\begin{array}{l} \mathcal{L}_{\boldsymbol{\theta}}\left(I_{l}, I_{r}, Z_{l}, Z_{r}\right)= \quad \lambda_{t} \mathcal{L}_{\boldsymbol{\theta}}^{S}\left(I_{l}, I_{r}, Z_{l}, Z_{r}\right)+\gamma \mathcal{L}_{\boldsymbol{\theta}}^{U}\left(I_{l}, I_{r}\right)+\mathcal{L}_{\boldsymbol{\theta}}^{R}\left(I_{l}, I_{r}\right) \end{array}$
$\lambda_{t}$ 和 $\gamma$ 是权衡参数

网络结构

用的残差网络Flownet

2种残差块：

上投影残差块：

具体网络结构：

四、实验结果

9就是系列2左右约束方法，然后看到本文方法可以结合真实深度预测得到比较精准结果，同时对于真实深度没有扫描的地方，通过CNN进行学习。

五、结论与思考

作者结论

总结

本文在有深度标签数据下是个结合CNN的方法，但是大多数情况是没有深度。要是以后有深度相机集成到手机上，这个方法不失为增强方法。

思考

参考

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

cv论文笔记：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（无监督深度预测系列3：半监督方法）

一、基本信息

二、研究背景

三、创新点

损失函数

Supervised loss.

Unsupervised loss.

Regularization loss.

总损失

网络结构

四、实验结果

五、结论与思考

作者结论

总结

思考

参考

Python 爬虫：Spring Boot 反爬虫的成功案例

Java中止线程的方式

京东科技数字化营销能力的演进与最佳实践| 京东云技术团队

cv論文筆記（動作識別1）：Convolutional Two-Stream Network Fusion for Video Action Recognition

數字圖像處理：自適應局部gamma校正

論文復現：Unsupervised Learning of Depth and Ego-Motion from Video（SfMLearner）

論文筆記：Unsupervised Learning of Depth and Ego-Motion from Video（無監督深度預測系列4：PoseCNN方法）

cv論文筆記：Semi-Supervised Deep Learning for Monocular Depth Map Prediction（無監督深度預測系列3：半監督方法）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結