在無人駕駛項目中,實現交通標誌識別是一項重要工作。本文以德國交通標誌數據集爲訓練對象,採用深度神經網絡LeNet架構處理圖像,實現交通標誌識別。具體處理過程包括包括:
探索和可視化數據集
數據預處理
構建、訓練和測試模型架構
採用該模型對新圖片進行預測
分析新圖片的softmax概率
下載數據 "traffic-signs-data.zip"
讀取文件" train.p" and "test.p"
import pickle
training_file = '../traffic-signs-data/train.p'
testing_file = '../traffic-signs-data/test.p'
with open(training_file, mode='rb') as f:train = pickle.load(f)
with open(testing_file, mode='rb') as f:test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
分析數據集中訓練集數量,測試集數量,圖片特徵以及分類數量。
n_train = X_train.shape[0]
# TODO: Number of testing examples.
n_test = X_test.shape[0]
# TODO: What's the shape of an traffic sign image?
image_shape = X_train.shape[1:]
# TODO: How many unique classes/labels there are in the dataset.
n_classes = len(set(y_train))
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print()
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
結果如下所示:Number of training examples = 34799 Number of testing examples = 12630 Image data shape = (32, 32, 3) Number of classes = 43
隨機顯示數據集的一張圖片:
1.圖片灰度處理
在交通標誌識別過程中,顏色並非圖片等主要特徵,將彩色圖片轉換爲灰度圖片依然可以識別,且單通道數圖片迭代更快。圖片灰度處理代碼如下:
X_train_rgb = X_train
X_train_gry = np.sum(X_train/3, axis=3, keepdims=True)
X_test_rgb = X_test
X_test_gry = np.sum(X_test/3, axis=3, keepdims=True)
print('RGB shape:', X_train_rgb.shape)
print('Grayscale shape:', X_train_gry.shape)
歸一化輸入圖片,將輸入特徵處理到相似的範圍內,使得代價函數優化起來更簡單更快捷,該項目中將訓練集和測試集歸一化到(-1,1)範圍。具體代碼如下:
X_train_normalized = (X_train - 128.)/128.
X_test_normalized = (X_test - 128.)/128.
在交通標誌識別訓練過程中,對原始數據進行訓練,結果顯示訓練集準確率挺高但是驗證集的準確率偏低,表現爲過擬合。出現過擬合時訓練集準確率高表明算法充分學習到了原始數據的特徵;驗證集準確率偏低表明原始數據的特徵不夠充分,使算法在新驗證集中泛化性表現較差。解決方式是對原始數據在數據預處理階段進行數據增強,增加訓練集數據。具體代碼如下:
from scipy import ndimage
def expend_training_data(X_train, y_train):
"""
Augment training data
"""
expanded_images = np.zeros([X_train.shape[0] * 5, X_train.shape[1], X_train.shape[2]])
expanded_labels = np.zeros([X_train.shape[0] * 5])
counter = 0
for x, y in zip(X_train, y_train):
# register original data
expanded_images[counter, :, :] = x
expanded_labels[counter] = y
counter = counter + 1
# get a value for the background
# zero is the expected value, but median() is used to estimate background's value
bg_value = np.median(x) # this is regarded as background's value
for i in range(4):
# rotate the image with random degree
angle = np.random.randint(-15, 15, 1)
new_img = ndimage.rotate(x, angle, reshape=False, cval=bg_value)
# shift the image with random distance
shift = np.random.randint(-2, 2, 2)
new_img_ = ndimage.shift(new_img, shift, cval=bg_value)
# register new training data
expanded_images[counter, :, :] = new_img_
expanded_labels[counter] = y
counter = counter + 1
return expanded_images, expanded_labels
X_train_normalized = np.reshape(X_train_normalized,(-1, 32, 32))
agument_x, agument_y = expend_training_data(X_train_normalized[:], y_train[:])
agument_x = np.reshape(agument_x, (-1, 32, 32, 1))
print(agument_y.shape)
print(agument_x.shape)
print('agument_y mean:', np.mean(agument_y))
print('agument_x mean:',np.mean(agument_x))
#print(y_train.shape)
將擴增之後的數據重新分割爲訓練集和驗證集,分割比例極代碼如下:
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(agument_x, agument_y,test_size=0.10,random_state=42)
交通標誌識別項目中,其圖形特徵明顯,識別任務比較簡單,分類輸出數固定,對於該類任務採用該架構較簡單的LeNet(X)就可以實現。所以項目算法架構採用LeNet(X),算法架構如下圖所示:
改進後的架構流程如下表所示:
Description | |
---|---|
Input | 32x32x1 gry image |
Convolution 5x5 | 1x1 stride, VALID padding, outputs 28x28x6 |
RELU | Activation |
Max pooling | 2x2 stride, outputs 14x14x6 |
Convolution 5x5 | 1x1 stride, VALID padding, outputs 10x10x16 |
RELU | Activation |
Max pooling | 2x2 stride, outputs 5x5x16 |
Flatten | Outputs 400 |
Fully connected | Outputs 120 |
RELU | Activation |
Dropout | Keep_prob=0.5 |
Fully connected | Outputs 84 |
RELU | Activation |
Dropout | Keep_prob = 0.5 |
Fully connected | Outputs = 43 |
from tensorflow.contrib.layers import flatten
def LeNet(x):
# Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
mu = 0
sigma = 0.1
# SOLUTION: Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), mean = mu, stddev = sigma))
conv1_b = tf.Variable(tf.zeros(6))
conv1 = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b
# SOLUTION: Activation.
conv1 = tf.nn.relu(conv1)
#conv1 = tf.nn.dropout(tf.nn.relu(conv1), keep_prob)
# SOLUTION: Pooling. Input = 28x28x6. Output = 14x14x6.
conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Layer 2: Convolutional. Output = 10x10x16.
conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
conv2_b = tf.Variable(tf.zeros(16))
conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
# SOLUTION: Activation.
conv2 = tf.nn.relu(conv2)
#conv2 = tf.nn.dropout(tf.nn.relu(conv2), keep_prob)
# SOLUTION: Pooling. Input = 10x10x16. Output = 5x5x16.
conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# SOLUTION: Flatten. Input = 5x5x16. Output = 400.
fc0 = flatten(conv2)
# SOLUTION: Layer 3: Fully Connected. Input = 400. Output = 120.
fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
fc1_b = tf.Variable(tf.zeros(120))
fc1 = tf.matmul(fc0, fc1_W) + fc1_b
# SOLUTION: Activation.
#fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(tf.nn.relu(fc1), keep_prob)
# SOLUTION: Layer 4: Fully Connected. Input = 120. Output = 84.
fc2_W = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
fc2_b = tf.Variable(tf.zeros(84))
fc2 = tf.matmul(fc1, fc2_W) + fc2_b
# SOLUTION: Activation.
#fc2 = tf.nn.relu(fc2)
fc2 = tf.nn.dropout(tf.nn.relu(fc2), keep_prob)
# SOLUTION: Layer 5: Fully Connected. Input = 84. Output = 43.
fc3_W = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = mu, stddev = sigma))
fc3_b = tf.Variable(tf.zeros(43))
logits = tf.matmul(fc2, fc3_W) + fc3_b
return logits
計算訓練集交叉熵平均值作爲訓練集的整體損失函數。訓練過程使用Adam算法優化梯度下降,並最小化損失函數。超參數設置如下:
EPOCHS =10, BATCH_SIZE=128,sigma = 0.1,learning rate =0.001
具體代碼如下:
logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y,logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
算法的最終輸出結果如下:
training set accuracy : 0.982
validation set accuracy : 0.976
test set accuracy : 0.945
下載8張德國交通標誌圖片,如圖中所示:
將圖片與訓練集進行相同的處理,包括灰度化,歸一化等。