歡迎訪問我的個人博客: zengzeyu.com
前言
原文章請見參考文獻: CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data.PDF
本文提出了一種新型的去地面點雲方法。一種對3D點雲數據編碼來給CNN進行訓練,最後來分割地面點雲的方法。
地面點分割方法
訓練數據說明
首先說明,根據Velodyne HDL-64E 生成的KITTI原始點雲數據分析得知,每一幀點雲尺寸大概爲 64x4500,本文每一幀數據爲 64x360 ,所以要對原始數據進行降採樣。在每一幀點雲中,每一線激光繞中心旋轉一圈得到的點雲按照 1° 的歸類分爲 360 份,每一份點雲的信息提取某一個點或者平均信息作爲點代表,代表點的特徵和 label 填入格子中生成CNN所需訓練數據。每個點 label 進行二分類,分爲地面點和分地面點。點特徵包括 P = [Px, Py, Pz, Pi, Pr] ([ 座標x, 座標y, 座標z, 反射強度intensity, 距離range ])。
A. 數據準備(Encoding Sparse 3D Data Into a Dense 2D Matrix)
爲了將稀疏的3D點雲數據應用的2D的CNN中,本文將其編碼爲2D的多信號通道數據儲存在矩陣 M 中,如下圖所示。
矩陣M尺寸爲 64x360 ,降採樣過程中,對一個格子內多個點進行平均取值作爲代表。同時爲了簡化數據,[x,z] 計算得到的值代表距離,因爲本文默認 Y 軸爲高度方向,所以 x, z 值爲對偶,可以採取此種方式進行簡化數據。對於空格子,則從臨近格子進行線性插值來生成該格子內值。
B. 訓練數據集(Training Dataset)
訓練數據集的重要性不容多說,本文自行開發了基於人工種子點選取的點雲分割工具(semiautomatic tool for ground annotation),原理參考圖像中的區域增長算法,只不過此處將點之間距離作爲判斷條件代替灰度值,同時發現當上下限爲[0.03, 0.07]米時分割效果最好。選取了KITTI不同場景下共252幀點雲作爲人工分割數據,將分割好的數據按照7:3比例分爲[訓練集,評價集]。
由於上面得到的數據量太少,所以本文又通過其他一些方法對剩下的19k幀數據,生成了訓練所需數據集,基與點雲特徵有:最低高度,高度變化值,兩線激光點雲之間的距離和高度差。本文也嘗試過自動生成數據(artificial 3D LiDAR data),但是效果較差。
C. 網絡結構以及訓練方法(Topology and Training of the Proposed Networks)
因爲生成的訓練數據較少,所以只採用淺層的CNN網絡結構(shallow CNN architectures),類型爲全卷積(fully convolutional)。卷基層和反捲基層都包含非線性的ReLU神經元(ReLU non-linearities),採用梯度下降方法進行訓練。網絡結構如下圖所示:
上文 A. 中得到的矩陣 M 作爲網絡輸入,因爲是逐點(pixel)進行分類,所以網絡的輸出尺寸與輸入尺寸相同,根據分類: ground = 1,其餘點根據softmax函數概率映射進行輸出。反捲積層(Deconvolutional
layers,廣泛應用於語義分割(semantic segmentation)領域)在本文提出的4個網絡結構中的中3個都有應用,其中包括效果最好的 L05+deconv (上圖中第一個)。
CNN的輸入數據先要進行歸一化(normalize)和剪裁(rescale),高度方面KITTI數據集將 3m 以上的數據進行了濾波處理,深度 d 通道方面則使用 log 進行歸一化處理。
實驗結果
————————————————————————————————
Caffe代碼復現
本文着手於復現 L05+deconv 網絡的訓練和預測。
Caffe 代碼一共包含 5 個文件,其中 3 個 Python 文件,以及自動生成的 2 個 prototxt 文件。
Python:
- pcl_data_layer.py
: 讀取數據層類
- net.py
: CNN網絡結構配置
- solve.py
: 求解器參數配置
prototxt:
- pcl_train.prototxt
: 網絡結構文件,由net.py自動生成
- solve.prototxt
: 求解配置文件,由solve.py自動生成
pcl_data_layer.py
import caffe
import numpy as np
import random
import os
import matplotlib.pyplot as plt
import sys
from enum import Enum
class pointInfo(Enum):
row = 0
col = 1
height = 2
range = 3
mark = 4
class PCLSegDataLayer(caffe.Layer):
def setup(self, bottom, top):
params = eval(self.param_str)
self.npy_dir = params["pcl_dir"]
self.list_name = list()
# two tops: data and label
if len(top) != 2:
raise Exception("Need to define two tops: data and label.")
# data layers have no bottoms
if len(bottom) != 0:
raise Exception("Do not define a bottom.")
self.load_file_name( self.npy_dir, self.list_name )
self.idx = 0
def reshape(self, bottom, top):
self.data, self.label = self.load_file( self.idx )
# reshape tops to fit (leading 1 is for batch dimension)
top[0].reshape(1, *self.data.shape)
top[1].reshape(1, *self.label.shape)
def forward(self, bottom, top):
# assign output
top[0].data[...] = self.data
top[1].data[...] = self.label
# pick next input
self.idx += 1
if self.idx == len(self.list_name):
self.idx = 0
def backward(self, top, propagate_down, bottom):
pass
def load_file(self, idx):
print("idx", idx)
in_file = np.load(self.list_name[idx]) #[row, col, height, range, mark]
in_file = self.rescale_data(in_file)
# is data correct
if not self.is_data_correct(in_file):
self.idx += 1
self.load_file(self.idx)
print("skip one frame.")
in_file = self.fix_nan_point(in_file)
in_data = in_file[:,:,0:-2]
in_label = in_file[:,:,-1]
return in_data, in_label
def load_file_name(self, path, list_name):
for file in os.listdir(path):
file_path = os.path.join(path, file)
if os.path.isdir(file_path):
os.listdir(file_path, list_name)
else:
list_name.append(file_path)
def rescale_data(self, in_file_data):
rescaled_cloud = np.zeros(shape=(64, 360, 5))
for i in range(64):
for j in range(1, 181):
kenel_data_1 = in_file_data[i, (j-1)*25:(j-1)*25+12, :]
kenel_data_2 = in_file_data[i, (j-1)*25+13:j*25, :]
rescaled_cloud[i, (j-1)*2] = self.find_point(kenel_data_1)
rescaled_cloud[i, (j - 1) * 2 + 1] = self.find_point(kenel_data_2)
return rescaled_cloud
def find_point(self, kernel_store):
tmp_range = 0
tmp_size = 0
for k in range(kernel_store.shape[0]):
if kernel_store[k, -2] != 0:
tmp_range += kernel_store[k, -2]
tmp_size += 1
if tmp_size != 0:
tmp_range = tmp_range / tmp_size
global_min_diff = sys.float_info.max
point_num = -1
for k in range(kernel_store.shape[0]):
tmp_diff = abs(tmp_range - kernel_store[k, -2])
if tmp_diff < global_min_diff:
global_min_diff = tmp_diff
point_num = k
if point_num == -1:
return point_num
else:
return kernel_store[point_num]
def fix_nan_point(self, in_cloud):
#fix edeg nan point 1st
in_cloud = self.fix_left_edge_nan_point( in_cloud )
in_cloud = self.fix_right_edge_nan_point( in_cloud )
#fix centrol nan point
for i in range(in_cloud.shape[0]):
for j in range(1, in_cloud.shape[1]):
if in_cloud[i, j, -1] == -1:
nan_size = 1
left = j - 1
right = j + 1
while in_cloud[i, left, -1] == -1:
left -= 1
nan_size += 1
while in_cloud[i, right, -1] == -1:
right += 1
nan_size += 1
height_diff_cell = (in_cloud[i, right, 2] - in_cloud[i, left, 2]) / nan_size
range_diff_cell = (in_cloud[i, right, 3] - in_cloud[i, left, 3]) / nan_size
in_cloud[i, j, 2] = in_cloud[i, left, 2] + (j - left) * height_diff_cell
in_cloud[i, j, 3] = in_cloud[i, left, 3] + (j - left) * range_diff_cell
if abs(j - left) < abs(right-j):
in_cloud[i, j, -1] = in_cloud[i, left, -1]
else:
in_cloud[i, j, -1] = in_cloud[i, right, -1]
return in_cloud
def fix_left_edge_nan_point(self, in_cloud):
for i in range(in_cloud.shape[0]):
if in_cloud[i, 0, -1] == -1:
nan_size = 1
left = 359
right = 1
while in_cloud[i,left,-1] == -1:
# print("left", left, in_cloud[i, left, 2], in_cloud[i, left, 3], in_cloud[i, left, 4])
left -= 1
nan_size += 1
# print("left", left, in_cloud[i, left, 2], in_cloud[i, left, 3], in_cloud[i, left, 4])
while in_cloud[i,right,-1] == -1:
# print("right", right, in_cloud[i, right, 2], in_cloud[i, right, 3], in_cloud[i, right, 4])
right += 1
nan_size +=1
# print("right", right, in_cloud[i, right, 2], in_cloud[i, right, 3], in_cloud[i, right, 4])
height_diff_cell = (in_cloud[i, right, 2] - in_cloud[i, left, 2]) / nan_size
range_diff_cell = (in_cloud[i, right, 3] - in_cloud[i, left, 3]) / nan_size
in_cloud[i, 0, 2] = in_cloud[i, left, 2] + (360 - left) * height_diff_cell
in_cloud[i, 0, 3] = in_cloud[i, left, 3] + (360 - left) * range_diff_cell
if abs(360 - left) < right:
in_cloud[i, 0, -1] = in_cloud[i, left, -1]
else:
in_cloud[i, 0, -1] = in_cloud[i, right, -1]
return in_cloud
def fix_right_edge_nan_point(self, in_cloud):
for i in range(in_cloud.shape[0]):
if in_cloud[i, in_cloud.shape[1]-1, -1] == -1:
nan_size = 1
left = in_cloud.shape[1]-2
right = 0
while in_cloud[i,left,-1] == -1:
left -= 1
nan_size += 1
while in_cloud[i,right,-1] == -1:
right += 1
nan_size +=1
height_diff_cell = (in_cloud[i, right, 2] - in_cloud[i, left, 2]) / nan_size
range_diff_cell = (in_cloud[i, right, 3] - in_cloud[i, left, 3]) / nan_size
in_cloud[i, in_cloud.shape[1]-1, 2] = in_cloud[i, left, 2] + (in_cloud.shape[1]-1 - left) * height_diff_cell
in_cloud[i, in_cloud.shape[1]-1, 3] = in_cloud[i, left, 3] + (in_cloud.shape[1]-1 - left) * range_diff_cell
if abs(in_cloud.shape[1]-1 - left) < right + 1:
in_cloud[i, in_cloud.shape[1]-1, -1] = in_cloud[i, left, -1]
else:
in_cloud[i, in_cloud.shape[1]-1, -1] = in_cloud[i, right, -1]
return in_cloud
def is_data_correct(self, in_cloud):
for i in range(in_cloud.shape[0]):
tmp_size = 0
for j in range(in_cloud.shape[1]):
if in_cloud[i, j, -1] == -1:
tmp_size += 1
if tmp_size == in_cloud.shape[1]:
print("tmp_size", tmp_size)
return False
return True
根據生成數據格式,將數據 split
爲用於數據層輸入的feature data
和 用於計算 loss
的 label
即 ground truth
。
net.py
根據 fcn 源碼格式編寫 cnn 代碼:
import caffe
from caffe import layers as L, params as P
def conv_relu(bottom, nout, ks=3, stride=1, pad=1):
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad,
param=[dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)])
return conv, L.ReLU(conv, in_place=True)
def deconv_relu(bottom, nout, ks=3, stride=1):
deconv = L.Deconvolution(bottom, convolution_param=dict(num_output=nout, kernel_size=ks, stride=stride,
bias_term=False), param=[dict(lr_mult=0)])
return deconv, L.ReLU(deconv, in_place=True)
def cnn():
n = caffe.NetSpec()
pydata_params = dict()
pydata_params['pcl_dir'] = '../velodyne/npy/npy_0.5_grid/'
pylayer = 'PCLSegDataLayer'
n.data, n.label = L.Python(module='pcl_data_layer', layer=pylayer,
ntop=2, param_str=str(pydata_params))
# base net
n.conv1_1, n.relu1_1 = conv_relu(n.data, nout=24, ks=11, pad=10)
n.conv2_1, n.relu2_1 = conv_relu(n.relu1_1, nout=48, ks=5, stride=2, pad=2)
n.conv3_1, n.relu3_1 = conv_relu(n.relu2_1, nout=48)
n.deconv4_1, n.relu4_1 = deconv_relu(n.relu3_1, nout=24, ks=5, stride=2)
n.conv5_1, n.relu5_1 = conv_relu(n.relu4_1, nout=64)
n.conv6_1, n.relu6_1 = conv_relu(n.relu5_1, nout=2, ks=4)
n.softmax = L.SoftmaxWithLoss(n.conv6_1, n.label)
return n.to_proto()
def make_net():
with open('pcl_train.prototxt', 'w') as f:
f.write(str(cnn()))
if __name__ == '__main__':
make_net()
運行net.py
文件後生成pcl_train.prototxt
文件:
layer {
name: "data"
type: "Python"
top: "data"
top: "label"
python_param {
module: "pcl_data_layer"
layer: "PCLSegDataLayer"
param_str: "{\'pcl_dir\': \'../velodyne/npy/npy_0.5_grid/\'}"
}
}
layer {
name: "conv1_1"
type: "Convolution"
bottom: "data"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 24
pad: 10
kernel_size: 11
stride: 1
}
}
layer {
name: "relu1_1"
type: "ReLU"
bottom: "conv1_1"
top: "conv1_1"
}
layer {
name: "conv2_1"
type: "Convolution"
bottom: "conv1_1"
top: "conv2_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 48
pad: 2
kernel_size: 5
stride: 2
}
}
layer {
name: "relu2_1"
type: "ReLU"
bottom: "conv2_1"
top: "conv2_1"
}
layer {
name: "conv3_1"
type: "Convolution"
bottom: "conv2_1"
top: "conv3_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 48
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu3_1"
type: "ReLU"
bottom: "conv3_1"
top: "conv3_1"
}
layer {
name: "deconv4_1"
type: "Deconvolution"
bottom: "conv3_1"
top: "deconv4_1"
param {
lr_mult: 0
}
convolution_param {
num_output: 24
bias_term: false
kernel_size: 5
stride: 2
}
}
layer {
name: "relu4_1"
type: "ReLU"
bottom: "deconv4_1"
top: "deconv4_1"
}
layer {
name: "conv5_1"
type: "Convolution"
bottom: "deconv4_1"
top: "conv5_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: "relu5_1"
type: "ReLU"
bottom: "conv5_1"
top: "conv5_1"
}
layer {
name: "conv6_1"
type: "Convolution"
bottom: "conv5_1"
top: "conv6_1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2
pad: 1
kernel_size: 4
stride: 1
}
}
layer {
name: "relu6_1"
type: "ReLU"
bottom: "conv6_1"
top: "conv6_1"
}
layer {
name: "softmax"
type: "SoftmaxWithLoss"
bottom: "conv6_1"
bottom: "label"
top: "softmax"
}
solve.py
solver
流程:
- 設計好需要優化的對象,以及用於學習的訓練網絡和用於評估的測試網絡(通過調用另外一個配置文件
prototxt
來執行) - 通過
forward
和backward
迭代的進行優化來更新參數 - 定期的評價測試網絡(設定多少次訓練後進行一次測試)
- 在優化過程中顯示模型和
solver
的狀態
單步迭代過程中,solver
進行如下工作:
- 調用
forward
算法來計算最終的輸出值,以及對應的loss - 調用
backward
算法來計算每層的梯度 - 根據選用的
solver
方法,利用梯度進行參數更新 - 記錄並保存每次迭代的學習率、快照和狀態
import caffe
import numpy as np
import os
# init
# caffe.set_device(0)
# caffe.set_mode_gpu()
solver = caffe.SGDSolver('solver.prototxt')
for _ in range(25):
solver.step(4000)
以上。
參考文獻:CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data.PDF