[Python深度學習](四)深度學習用於計算機視覺

本文爲《Python深度學習》的學習筆記。

Part2 深度學習實踐

1-4章主要是介紹深度學習,以及其工作原理,5-9章將通過實踐培養出如何用深度學習解決實際問題。

第5章 深度學習用於計算機視覺

理解卷積神經網絡
使用數據增強來降低過擬合
使用預訓練的卷積神經網絡進行特徵提取
微調預訓練的卷積神經網絡
將卷積神經網絡學到的內容及其如何做出分類決策可視化

5.1 卷積神經網絡簡介

深入介紹神經網絡原理,以及在計算機視覺任務上爲何如此成功。下面是Conv2D層和MaxPooling2D層的堆疊。

# 5-1 實例化一個小型的神經網絡
from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 3, 64)          36928     
=================================================================
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________

可以看到,每個Conv2D層和MaxPooling2D層的輸出都是一個形狀爲(heights, width, channels)的3D張量。

# 5-2 在卷積神經網絡上添加分類器
model.add(layers.Flatten())
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(10, activation = 'softmax'))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________
# 5-3 在MNIST圖像上訓練卷積神經網絡
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28 ,1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model.compile(optimizer = 'rmsprop',
             loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

model.fit(train_images, train_labels, epochs = 5, batch_size = 64)

# 評估
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(test_acc)

5.1.1 卷積運算

密集連接層:從輸入特徵空間學習到的是全局模型
卷積層:學到的是局部模式

  • 卷積神經網絡學習到的模式具有平移不變性(translation invariant)。卷積網絡在圖像右下角學習到某個模式後,可以在任何地方識別這個模式。對於密集連接層只能重新學習。(並且視覺世界從根本上具有平移不變性)
  • 卷積世界網絡可以學到模式的空間層次結構(spatial hierarchies of patterns)。第一層卷積層會學到較小的局部模式(比如邊緣),而第二層將第一層的特徵組合成更大的模式,從而使卷積層越來越複雜。(並且視覺世界從根本上具有看見層次結構)

兩個空間軸(高度和寬度)和一個深度軸(channel)的3D張量,其卷積叫做特徵圖(feature map)。
 

  1. padding: “same”表示前後大小一致,“valid”表示不用填充(default)
  2. stride

5.1.2 最大池化運算

在每次MaxPoolingD後,特徵圖的尺寸會減半。最大池化通常使用2x2的窗口或者步幅2。而卷積通常使用3x3的窗口或者步幅1。

model_no_max_pool = models.Sequential()
model_no_max_pool.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (28, 28, 1)))
model_no_max_pool.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
model_no_max_pool.add(layers.Conv2D(64, (3, 3), activation = 'relu'))
model_no_max_pool.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_11 (Conv2D)           (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 24, 24, 64)        18496     
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 22, 22, 64)        36928     
=================================================================
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________

​
  • 不利於學習特徵的空間層級結構。第三層3x3的窗口仍然只包含初始的輸入,卷積神經網絡學到的高級模式相對於初始輸入來說依然很小。
  • 最後一層每個樣本元素有36928個元素,會導致嚴重的過擬合。

注意,最大池化不是實現這種下采樣的唯一方法。你已經知道,還可以在前一個卷積層中使用步幅來實現。此外,你還可以使用平均池化來代替最大池化,其方法是將每個局部輸人圖塊變換爲取該圖塊各通道的平均值,而不是最大值。但最大池化的效果往往比這些替代方法更好。簡而言之,原因在於特徵中往往編碼了某種模式或概念在特徵圖的不同位置是否存在(因此得名特徵圖),而觀察不同特徵的最大值而不是平均值能夠給出更多的信息。因此,最合理的子採樣策略是首先生成密集的特徵圖(通過無步進的卷積),然後觀察特徵每個小圖塊上的最大激活,而不是查看輸入的稀疏窗口(通過步進卷積)或對輸人圖塊取平均,因爲後兩種方法可能導致錯過或淡化特徵是否存在的信息

 

5.2 在小型數據集從頭開始訓練一個卷積神經網絡

5.2.1 深度學習與小數據問題的相關性

通常來說,深度學習需要大量數據。但是有時候如果模型很小,並做了很好的正則化,同時任務十分簡單,可能幾百個樣本就足夠了。
此外,神經網絡還有高度的可複用性。特別是在計算機視覺領域,許多預訓練模型都能公開下載到。

5.2.2 下載數據

本章數據在kaggle上https://www.kaggle.com/c/dogs-vs-cats/data 這個數據集包含25000張貓狗圖片,大小爲543MB,我們創建三個小的子集:每個類別各1000個樣本的訓練集,每個類別各500個樣本驗證集和每個類別各500個樣本的測試集

# 5-4 將圖像複製到訓練、驗證和測試的目錄
import os, shutil

original_dataset_dir = "D:/kaggle_data/kaggle_original_data/train"

#生成保存小數據集的目錄
base_dir = 'D:/kaggle_data/cats_and_dogs_small'
os.mkdir(base_dir)
# 對應劃分後的序列、驗證和測試的目錄
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
# 設置貓和狗的訓練、驗證、測試目錄                                                                       
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)

train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

validation_cats_dir = os.path.join(validation_dir , 'cats')
os.mkdir(validation_cats_dir)

validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)

test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir) 

test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)
# 將圖片複製到目錄
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_cats_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)

fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(validation_dogs_dir, fname)
    shutil.copyfile(src, dst)
    
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)
print('total training cat images:',len(os.listdir(train_cats_dir)))
print('total training dog images:',len(os.listdir(train_dogs_dir)))
print('total validation cat images:',len(os.listdir(validation_cats_dir)))
print('total validation dog images:',len(os.listdir(validation_dogs_dir)))
print('total test cat images:',len(os.listdir(test_cats_dir)))
print('total test dog images:',len(os.listdir(test_dogs_dir)))
total training cat images: 1000
total training dog images: 1000
total validation cat images: 500
total validation dog images: 500
total test cat images: 500
total test dog images: 500

5.2.3 構建網絡

# 5-5  將貓狗分類的小型卷積神經網絡實例化
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation = 'relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3, 3), activation = 'relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation = 'relu'))
model.add(layers.Dense(1, activation = 'sigmoid'))
model.summary()
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_7 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 72, 72, 128)       36992     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 36, 36, 128)       0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 34, 34, 128)       147584    
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               3211776   
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 513       
=================================================================
Total params: 3,545,345
Trainable params: 3,545,345
Non-trainable params: 0
_________________________________________________________________

編譯這一步,使用RMSprop優化器。因爲網絡最後一層是單一sigmoid單元,使用二元交叉熵作爲損失函數。

# 5-6 配置模型用於訓練
from keras import optimizers

model.compile(loss = 'binary_crossentropy',
             optimizer = optimizers.RMSprop(lr = 1e-4),
             metrics = ['acc'])

5.2.4 數據預處理

(1)讀取圖像文件
(2)將JEPG文件解碼爲RGB像素網絡
(3)將這些像素網絡轉換爲浮點數張量
(4)將像素值縮放到[0,1]區間
keras.preprocessing.image能快速創建python生成器。

# 5-7 使用ImageDataGenerator從目錄中讀取圖像 
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255)
test_datagen = ImageDataGenerator(rescale = 1./255)

train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size = (150, 150),
        batch_size = 20,
        class_mode = 'binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size = (150, 150),
        batch_size = 20,
        class_mode = 'binary')
for data_batch, labels_batch in train_generator:
    print('data batch shape:', data_batch.shape)
    print('labels batch shape:', labels_batch.shape)
    break

# data batch shape: (20, 150, 150, 3)
# labels batch shape: (20,)

train_generator相當於一個迭代器,這裏只用fit_generator來擬合數據

# 5-8 利用批量生成器擬合數據
history = model.fit_generator(
        train_generator,
        steps_per_epoch = 100,
        epochs = 30,
        validation_data = validation_generator,
        validation_steps = 50)
# 5-9 保存模型
model.save('cats_and_dogs_small_1.h5')

 

# 5-10 繪製過程中的損失曲線和精度曲線
import matpltlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and validation loss')
plt.legend()

plt.show()

訓練精度隨時間線性增加,直到接近100%,而驗證精度停留在70%左右,顯然已經過擬合了。這裏我們使用視覺領域的新方法,數據增強(data augmentation)

5.2.5 使用數據增強

過擬合的原因是學習樣本太少,這裏我們生成可信圖像。
rotation_range是角度值(0-180),隨機旋轉
width_shift_range和height_shift_range是圖像在水平或者垂直方向上平移的範圍
shear_range是隨機錯切變換的角度
zoom_range是圖像隨機縮放的範圍
horizontal_flip是隨機將一般圖像水平翻轉
flip_mode是用於填充新建圖像的方法

# 5-11 利用ImageDatagenerator來設置數據增強
datagen= ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_modee = 'nearest')
# 5-12 顯示幾個隨機增強後的訓練圖像
from keras.preprocessing import image

fnames = [os.path.join(train_cats_dir, fname) for fname in os.listdir(train_cats_dir)]
img_path = fname[3]
img = image.load_img(img_path, target_size = (150, 150))

x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)

i = 0
for batch in datagen.flow(x, batch_size = 1):
    plt.figure(i)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    i += 1
    if i % 4 == 0:
        break

plt.show()

這裏再添加Dropout層

# 5-13 定義一個包含dropout的新卷積神經網絡
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.dense(512, activation='relu'))
model.add(layers.Dense(1, activation = 'sigmoid'))

mdoel.compile(loss = 'binary_crossentropy',
             optimizer = optimizers.RMSprop(lr = 1e-4),
             metrics = ['acc'])
# 5-14 利用數據增強生成器訓練卷積神經網絡
train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range = 40,
    width_shift_rantge = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horiontal_flip = Ture)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size = (150, 150),
    batch_size = 32,
    class_mode = 'binary')

validation_generator = test_datagen.flow_from_directory(
    validation_dir,
    target_size = (150, 150),
    batch_size = 32,
    class_mode = 'binary')

history = model.fit_generator(
    train_generator,
    steps_per_epoch = 100,
    epochs = 100,
    validation_data = validation_generator,
    validation_steps = 50)

# 5-15 保存模型
model.save('cats_and_dogs_small_2.h5')

 

5.3 使用與訓練的卷積神經網絡

使用VGG16架構:特徵提取(feature extraction)和微調模型(fine-tuning)

5.3.1 特徵提取

重複使用卷積基(convolutional base),卷積層提取的通用性取決於該層在模型中的深度。
keras.applications中的部分模型Xception、Inception V3、Resnet50、VGG16、VGG19、MobileNet

# 5-16 將VGG16卷積基實例化
from keras.applications import VGG16

conv_base = VGG16(weights = 'imagenet', 
                  include_top = False, 
                  input_shape = (150, 150, 3))
# weights指定模型初始化的權重
# include_top指定是否包含密集連接分類器
# input_shape是輸入到網絡中的圖像張量

特徵圖現在爲(4, 4, 512),在這個特徵上添加一個密集連接分類器
你的數據集上運行卷積基,將輸出的數組作爲輸入,輸入到獨立的密集連接分類器中。不允許使用數據增強。
在頂部添加Dense層來擴展已有模型。

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章