MVSNet

摘要

MVSNet的大致流程

先根據網絡提取特徵
通過單應性根據照相機的錐形體構建3D的cost volume
用3D的卷積去迴歸初始深度圖，然後根據參考的圖片去生成最終的結果
（適合任意的輸入數目）

介紹

傳統方法在**朗博情景（Lambertian scenarios,理想反射面，不吸收任何光，都反射掉）**下有很好的結果，但是對於鏡面反射、紋理等難以處理，在重建的完成性有很大的提升空間
基於CNN的重建，可以引入全局的語義信息，例如反射的先驗信息，有更強的匹配能力。在兩個視角的立體匹配中，有了很好的嘗試，優於傳統方法。將匹配問題變成了水平方向的像素的視差估計（disparity estimation什麼意思，定義）
目前多視角立體重建的兩種方法
SurfaceNet、
Learned Stereo Machine (LSM)
適合小尺寸的，耗時過長
MVSNet（將重建 ===> 深度估計的問題）
輸入：一個參考圖+n個源圖像
輸出：參考圖的深度圖（一張）
1. 先根據網絡提取特徵
2. 通過單應性（key）根據照相機的錐形體構建3D的cost volume
3. 用3D的卷積去迴歸初始深度圖，然後根據參考的圖片去生成最終的結果
4. 基於方差的度量方法，maps multiple features into one cost feature in the volume
  （適合任意的輸入數目）
5. 後處理，重建點雲

MVSNet

training
1. 數據準備（圖片的大小必須是32的倍數）
  1. images [None, 3, None, None, 3]
  2. cam [None, 3, 2, 4, 4] （相機參數具體的形式）
  3. depth_img [None, None, None, 1]
    （None會動態的設定shape）
  4. depth_start = tf.reshape(
    tf.slice(cams, [0, 0, 1, 3, 0], [1, 1, 1, 1, 1]), [1])
    depth_interval = tf.reshape(
    tf.slice(cams, [0, 0, 1, 3, 1], [1, 1, 1, 1, 1]), [1])
2. 網絡結構
  1. Image Features+ Differentiable Homography
    1. Image Features-----UNetDS2GN
      1. 定義
        ref_image = tf.squeeze(tf.slice(images, [0, 0, 0, 0, 0], [-1, 1, -1, -1, 3]), axis=1)
        ref_cam = tf.squeeze(tf.slice(cams, [0, 0, 0, 0, 0], [-1, 1, 2, 4, 4]), axis=1)
        ref_tower = UNetDS2GN({‘data’: ref_image}, is_training=True, reuse=True)
        view_towers = []
        view_tower = UNetDS2GN({‘data’: view_image}, is_training=True, reuse=True)
        view_towers.append(view_tower)
      2. 結構
        UNetDS2GN:unet+uniNetDS2GN(7conv_gn+1conv[kerner:3,3,5,3,3,5,3,3],stride:[1,1,2,1,1,2,1,1])
    2. Differentiable Homography
      1. 定義
        depth_end = depth_start + (tf.cast(depth_num, tf.float32) - 1) * depth_interval
        depth_num = 192
      2. 程序
        view_homographies = []
        for view in range(1, FLAGS.view_num):
        view_cam = tf.squeeze(tf.slice(cams, [0, view, 0, 0, 0], [-1, 1, 2, 4, 4]), axis=1)
        homographies = get_homographies(ref_cam, view_cam, depth_num=depth_num,
        depth_start=depth_start, depth_interval=depth_interval)
        view_homographies.append(homographies)
      3. get_homographies
        depth = depth_start + tf.cast(tf.range(depth_num), tf.float32) * depth_interval(一維數組)
        R(3x3) T(3x1) depth_mat（1x192x3x3）
        計算公式 KiRi((1 - (RiTi - RT)*fronto_direction)/depth_mat)RtK-1
        結果的大小 (1X192X3X3)
        對於192個不同的depth都對應一個3x3的矩陣（單應性）
        view_homographies的大小 [2(num_view-1),192,3,3]
    3. cost Metric — cost volume
      1. 定義
        feature_c = 32
        feature_h = FLAGS.max_h / 4
        feature_w = FLAGS.max_w / 4
        ave_feature = ref_feature = ref_tower.get_output()
        ave_feature = ref_feature2 = tf.square(ref_feature)
        view_features = tf.stack(view_features, axis=0)
      2. 過程
        根據計算的單應性的矩陣，將其他視角的照片變爲參考圖片的視角的圖片
        根據各個視角，計算Cost Volumes(公式）
    4. cost volume regularization
      filtered cost volume, size of (B, D, H, W, 1)
      網絡 RegNetUS0
      output filtered_cost_volume
    5. depth map and probability map
      網絡的結果-> softmax() x-1 --> 分別乘以不同的深度，得出 depth map
      
      根據depth map在深度[0,192]的那一個位置來抽取filtered_cost_volume中的4張圖，求和，得出probability map
    6. refinde-depth_map
    7. loss

重建後的度量標準

f-score https://www.tanksandtemples.org/tutorial/
accuracy
completeness

相機的參數

cam [2,4,4]
1[4,4] R + T
[[r1,r2,r3,t1]
[r4,r5,r6,t2]
[r7,r8,r9,t3]
[0,0,0,1]]
2[4,4] K內參矩陣
[[fx,s,x0,0]
[0,fy,y0,0]
[0,0,1,0]
[depth_start,depth_interval,0,1]]

視差

tenserflow

tf.slice(input_, begin, size, name=None)：按照指定的下標範圍抽取連續區域的子集
input = [[[1, 1, 1], [2, 2, 2]],
[[3, 3, 3], [4, 4, 4]],
[[5, 5, 5], [6, 6, 6]]]
tf.slice(input, [1, 0, 0], [1, 1, 3]) ==> [[[3, 3, 3]]]
tf.slice(input, [1, 0, 0], [1, 2, 3]) ==> [[[3, 3, 3],
[4, 4, 4]]]
tf.set_shape()
shape裏面可以有None，會根據輸入的大小自動調整
tf.tile(input ,multiples,name=None)
tensorflow中的tile()函數是用來對張量(Tensor)進行擴展的，其特點是對當前張量內的數據進行一定規則的複製。最終的輸出張量維度不變。
a = tf.constant([[1, 2], [3, 4], [5, 6]], dtype=tf.float32)
a1 = tf.tile(a,[2,3])
[[1,2,1,2,1,2],[3,4,3,4,3,4],[5,6,5,6,5,6],
[1,2,1,2,1,2],[3,4,3,4,3,4],[5,6,5,6,5,6]]
squeeze(input,axis=None,name=None,squeeze_dims=None)

‘t’ is a tensor of shape [1, 2, 1, 3, 1, 1]

tf.shape(tf.squeeze(t)) # [2, 3]

Variable
在TensorFlow中，變量(Variable)是特殊的張量(Tensor)，它的值可以是一個任何類型和形狀的張量。
v = tf.Variable([1,2,3]) #創建變量v，爲一個array
tf.stack
tf.stack(values, axis=0, name=‘stack’)以指定的軸axis，將一個維度爲R的張量數組轉變成一個維度爲R+1的張量。即會在新的張量階上合併，張量的階數也會增加
a = tf.constant([[1,1],[2,2]]) #2x2
b = tf.constant([[3,3],[4,4]]) #2x2
c = tf.stack([a,b],axis=0) # 2x2x2
[[[1,1],[2,2]],[[3,3],[4,4]]]
tf.linspace(start,stop,num,name=None)
返回一個tensor，該tensor中的數值在start到stop區間之間取等差數列
tf.reduce_sum() 求和 axis=0 ,axis=1
tf.clip_by_value(A, min, max):輸入一個張量A,把A中的每一個元素的值都壓縮在min和max之間。小於min的讓它等於min,大於max的元素的值等於max
tf.gather_nd
同上，但允許在多維上進行索引

網站

https://www.eth3d.net/view_multi_view_result?dataset=12&tid=1 (球)
http://zhyan.tk/2017/07/03/mvs-learn-1-middlebury/
(A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms)
https://www.cnblogs.com/gemstone/archive/2011/12/19/2293806.html 視差 Disparity / Parallax

問題

點雲重建成網格是不是很難
the Poisson reconstruction
看論文 poisson surface reconstruction以及screened poisson reconstruction
爲了便於理解可以先閱讀marching cubes reconstrution

MVSNet

MVSNet

摘要

介紹

相關的工作

MVSNet

重建後的度量標準

相機的參數

視差

tenserflow

‘t’ is a tensor of shape [1, 2, 1, 3, 1, 1]

網站

問題

mxnet 1.0.0　編譯安裝

mxnet3——線性迴歸

如何通過pycharm實現遠程代碼的調試和開發

mxnet2——NDArray and autograd

opencv 配置

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結