PointNet代碼詳解

最近在做點雲深度學習的機器人抓取，這篇博客主要是把近期學習PointNet的一些總結的知識點彙總一下。

PointNet概述詳見以下網址和博客，這裏也就不再贅述了。
三維深度學習之pointnet系列詳解
 PointNet網絡結構詳細解析
 PointNet論文理解和代碼分析
 PointNet論文復現及代碼詳解

這裏着重來探討一下內部的代碼（pointnet-master\models路徑下的）。
PointNet原文及Github代碼下載

詳細的網絡結構圖如下

主要講一下應該注意的地方：
（1）網絡結構內部主要分爲分類和分割兩部分，從 global feature 開始區分分類和分割，關於點雲的分類和分割，詳見點雲分類與分割的區別聯繫。
（2）我們這邊主要定義數據維度的表示爲 (B, H, W, C) ，也就是Batch, Height, Width, Channel。開始輸入時是一個3D的張量 (B, n, 3)，其中B即爲訓練的批量， n 爲點雲個數，3則代表了點雲的(x,y,z)的3個位置，因此爲了後續的卷積操作，會將其增加維度到4D張量(B, n, 3, 1)，方便後面卷積核提取產生特徵通道數C，(B, n, 3, C)。
（3）第一層的卷積核大小爲(1, 3)，因爲每個點的維度都是(x, y, z)，後續的所有卷積核大小均爲(1, 1)，因爲經過第一次卷積之後數據就變爲了(B, n, 1, C)。
（4）整個網絡框架內部使用了兩個分支網絡Transform(T-Net)。T-Net對原樣本進行一定的卷積和全連接等操作後得到變換矩陣並與原樣本相乘完成點雲內部結構的調整，但並不改變原樣本數據的格式。第一次T-Net輸出一個33的矩陣，第二次T-Net輸出一個6464的矩陣。

閱讀後續代碼前請仔細看完這篇博客，務必理解裏面內容
PointNet網絡結構詳細解析

T-Net

對應文件爲“pointnet-master\models\transform_nets.py”
根據網絡結構圖可知輸入量時B×n×3，對於input_transform來說，主要經歷了以下處理過程：
卷積：64–128–1024
全連接：1024–512–256–3*K（代碼中給出K=3）
最後reshape得到變換矩陣

def input_transform_net(point_cloud, is_training, bn_decay=None, K=3):
    """ Input (XYZ) Transform Net, input is BxNx3 gray image
        Return:
            Transformation matrix of size 3xK """
    #建議閱讀時忽略batch_size的維度，將張量視作一個3維矩陣[n, W, C]
    #其中n爲點雲數，也就是Height；W爲矩陣寬Width；C爲特徵通道數
    batch_size = point_cloud.get_shape()[0].value	#得到訓練批量
    num_point = point_cloud.get_shape()[1].value	#得到點雲個數

    input_image = tf.expand_dims(point_cloud, -1)	#擴展維度爲四維張量，-1表示最後一個維度，第四個維度來表示特徵通道數
    #input_image  [batch_size, num_point, 3, 1]
    #第一次卷積，採用64個大小爲[1,3]的卷積核
    #完成之後  net表示爲[batch_size, num_point, 1, 64]
    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    #第二次卷積，採用128個大小爲[1,1]的卷積核
    #完成之後  net表示爲[batch_size, num_point, 1, 128]
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    #第三次卷積，採用1024個大小爲[1,1]的卷積核
    #完成之後  net表示爲[batch_size, num_point, 1, 1024]
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    #max pooling，掃描的模板大小爲[num_point, 1]，也就是每一個特徵通道僅保留一個feature
    #完成該池化操作之後的net表示爲[batch_size, 1, 1, 1024]
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')
	#reshape張量，即將其平面化爲[batch_size, 1024]
    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)
	
	#最後得到的net表示爲[batch_size, 256]，即每組只保留256個特徵
	#再經過下述的全連接操作之後得到[batch_size, 3*K]大小的transform
    with tf.variable_scope('transform_XYZ') as sc:
        assert(K==3)
        weights = tf.get_variable('weights', [256, 3*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [3*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant([1,0,0,0,1,0,0,0,1], dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    #重新塑造變換矩陣[batch_size, 3*K]爲[batch_size, 3, K]
    transform = tf.reshape(transform, [batch_size, 3, K])
    return transform

feature_transform同input_transform類似，經歷處理過程表示爲：
卷積：64–128–1024
全連接：1024–512–256–64*K（代碼中給出K=64）
最後reshape得到變換矩陣
這裏就不再註釋，大家也可以通過下面未註釋的代碼測試一下上一個transform有沒有看懂了。

def feature_transform_net(inputs, is_training, bn_decay=None, K=64):
    """ Feature Transform Net, input is BxNx1xK
        Return:
            Transformation matrix of size KxK """
    batch_size = inputs.get_shape()[0].value
    num_point = inputs.get_shape()[1].value

    net = tf_util.conv2d(inputs, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_feat') as sc:
        weights = tf.get_variable('weights', [256, K*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [K*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant(np.eye(K).flatten(), dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    transform = tf.reshape(transform, [batch_size, K, K])
    return transform

分類網絡結構

分類網絡內容即PointNet網絡結構圖中最上面的那個主要框圖，即Classification Network。對應文件爲“pointnet-master\models\pointnet_cls.py”

def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output Bx40 """
    batch_size = point_cloud.get_shape()[0].value	#得到訓練批量
    num_point = point_cloud.get_shape()[1].value	#得到點雲個數
    end_points = {}									#生成字典

    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)	#通過第一個T-Net得到input_tranform
    point_cloud_transformed = tf.matmul(point_cloud, transform)	#將原矩陣和input_transform相乘完成轉換
    input_image = tf.expand_dims(point_cloud_transformed, -1)	#擴展爲4維張量[batch_size, num_point, 3, 1]

	#使用64個大小爲[1,3]的卷積核得到64個特徵通道
    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    #使用64個大小爲[1,1]的卷積核得到64個特徵通道
    net = tf_util.conv2d(net, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)
	#最後得到net爲[batch_size, point_num, 1, 64]
	#第二次使用T-Net進行轉換
    with tf.variable_scope('transform_net2') as sc:
        transform = feature_transform_net(net, is_training, bn_decay, K=64)
    end_points['transform'] = transform		#在字典中用transform保存feature transform，記錄原始特徵
    #squeeze操作刪除大小爲1的維度
    #注意到之前操作之後得到的net爲[batch_size, num_point, 1, 64]
    #從維度0開始，這裏指定爲第二個維度，squeeze之後爲[batch_size, num_point, 64]
    #然後相乘得到變換後的矩陣
    net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform)
    #指定第二個維度並膨脹爲4維張量[batch_size, num_point, 1, 64]
    net_transformed = tf.expand_dims(net_transformed, [2])

	#進行3次卷積操作，最後得到[batch_size, num_point, 1, 1024]
    net = tf_util.conv2d(net_transformed, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay)

    # Symmetric function: max pooling
    #max pooling，掃描的模板大小爲[num_point, 1]，也就是每一個特徵通道僅保留一個feature
    #完成該池化操作之後的net表示爲[batch_size, 1, 1, 1024]
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='maxpool')
	
	#重塑張量得到[batch_size, 1024]
    net = tf.reshape(net, [batch_size, -1])
    #3次全連接並且進行2次dropout
    #最後得到40種分類結果
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp2')
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')

	#return返回分類結果以及n*64的原始特徵
    return net, end_points

分割網絡結構

在PointNet原網絡結構圖中從global_feature中向下的分支，即Segmentation Network。對應文件爲“pointnet-master\models\pointnet_seg.py”。

由於結構與分類網絡類似，這裏僅簡單註釋一下。

def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output BxNx50 """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}

	#第一次T-Net轉換，轉換完成之後膨脹爲四維張量
    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
    point_cloud_transformed = tf.matmul(point_cloud, transform)
    input_image = tf.expand_dims(point_cloud_transformed, -1)

	#兩次卷積，完成後得到[batch_size, num_point, 1, 64]
    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)

	#第二次T-Net轉換，轉換過程中注意維度的變換操作
    with tf.variable_scope('transform_net2') as sc:
        transform = feature_transform_net(net, is_training, bn_decay, K=64)
    end_points['transform'] = transform
    net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform)
    point_feat = tf.expand_dims(net_transformed, [2])
    print(point_feat)

	#3次卷積操作和1次最大池化操作
	#最後得到[batch_size, 1, 1, 1024]
    net = tf_util.conv2d(point_feat, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay)
    global_feat = tf_util.max_pool2d(net, [num_point,1],
                                     padding='VALID', scope='maxpool')
    print(global_feat)
	
	#未進行性tile之前爲[batch_size, 1, 1, 1024]
	#tile()函數用來對張量進行擴展，對維度1複製拓展至num_point倍
	#tile()操作之後爲[batch_size, num_point, 1, 1024]
    global_feat_expand = tf.tile(global_feat, [1, num_point, 1, 1])
    #tf.concat()將兩個矩陣進行拼接，拼接得到網絡結構中的n×1088的矩陣，n即num_point，此處忽略batch_size
    concat_feat = tf.concat(3, [point_feat, global_feat_expand])
    print(concat_feat)

	#5次卷積操作得到128個特徵通道輸出 [batch_size, num_point, 1, 128]
    net = tf_util.conv2d(concat_feat, 512, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv6', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 256, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv7', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv8', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv9', bn_decay=bn_decay)
	#max_pooling操作得到50個特徵輸出
	#[batch_size, num_point, 1, 50]
    net = tf_util.conv2d(net, 50, [1,1],
                         padding='VALID', stride=[1,1], activation_fn=None,
                         scope='conv10')
    #squeeze()刪除大小爲1的第二個維度
    net = tf.squeeze(net, [2]) # BxNxC

    return net, end_points

以上便是此次全部代碼解釋內容，有錯誤請及時留言告知

PointNet代碼詳解