Enhancing Vision with Convolutional Neural Networks

參考：Ubuntu 16 安裝TensorFlow及Jupyter notebook 安裝TensorFlow。

本篇博客翻譯來自 Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

僅供學習、交流等非盈利性質使用！！！

1. convolutions and pooling （卷積和池化）

在進行圖像處理的過程中，經常會使用濾波器矩陣（Filter）來對原始圖片進行處理，而這個過程就是卷積（convolutions）。
如下圖所示：

在上圖中，針對其中的一個像素點，應用濾波器矩陣（圖中給出的矩陣是隨機定義的一個矩陣），那麼卷積就是針對每個像素點，考慮其上下左右、左上下、右上下的鄰居像素點，和濾波器矩陣對應點相乘得到新的像素點，這個轉換過程就叫做卷積。

應用卷積可以具有特殊意義，如下兩個圖：

應用濾波器，可以把縱向特徵明顯化（即垂直特徵明顯）；

而應用這個濾波器，則可以把橫向特徵明顯化（即水平特徵明顯）；

關於圖像的卷積意義，推薦這篇博客：理解圖像卷積操作的意義

池化：簡單理解，就是圖片的壓縮，即把原始比如128*128像素的圖片壓縮成28*28的圖片。

一個簡單的例子，如下圖：

在圖中，一個4*4像素的圖片，被壓縮爲2*2的圖片。其做法是：每次處理4個像素，找到其最大的像素進行返回。當然這只是其中的一種池化方法，你也可以針對4個像素進行求均值然後返回也可以。

2.實現卷積和池化

之前的代碼：

model = tf.keras.models.Sequential([ 
    tf.keras.layers.Flatten(), 
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

在這個代碼基礎上，如何修改，可以添加捲積和池化層呢？
如下代碼所示：

model = tf.keras.models.Sequential([
	tf.keras.layers.Conv2D(64,(3,3), activation='relu',input_shape=(28,28,1)),
	tf.keras.layers.MaxPooling2D(2,2),
	tf.keras.layers.Conv2D(64,(3,3),activation='relu'),
	tf.keras.layers.MaxPooling2D(2,2),
	tf.keras.layers.Flatten(), 
    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

從代碼中可看成：

最後三行是和之前一樣的；
第二行，定義了一個卷積層（Conv2D)，這個卷積層有64個濾波器矩陣，每個矩陣是一個3*3的矩陣，activation是relu（會丟棄負值），而input_shape中的值是（28,28,1）,前面的28,28是圖片的大小，而最後一個代表的是像素點的值，即圖片是灰度圖；
第二行的卷積層中的64個濾波器矩陣裏面的值，最開始是隨機的，隨着訓練的推進，這些值會被改變，而改變的方向則是使得最終預測效果較好的方向。
第三行是一個池化層，MaxPooling意味着最大值保留，同時池化的大小是(2,2),那麼一次會處理4個像素點；
第四、五行繼續添加了一個卷積層和池化層；
當一個圖片經過2個卷積、池化層後，即叨叨Flatten時，圖片會被縮小；
縮小的圖片，其實指的是對於目標變量更有話語權的特徵被篩選出來；

通過下面的代碼可以查看構造的模型：

model.summary()

那麼，上面構造的模型的summary是什麼呢？如下所示：

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               204928    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 243,786
Trainable params: 243,786
Non-trainable params: 0

從上面的數據結果可以看出:

第一層卷積輸出的大小是26*26*64 ,這是什麼意思呢？

首先，一個28*28的圖片數據經過卷積層的濾波器矩陣後，其圖片變爲26*26的大小。由於濾波器大小是3*3，所以原始28*28的圖像的最外面的一圈像素點是不能被計算的。這樣上下左右就會各少一個像素點，所以得到的是26*26的圖片。

所以，如果濾波器是5*5，那麼輸出是多少呢？（應該是24*24，會少4個像素點）。

其次，64代表64個濾波器，那麼一個圖片就會輸出64個新的圖片。

第一個池化層，由於其大小是2*2，所以會把4個像素點變成一個，所以26*26的像素點，會變成13*13。
第二個卷積層是類似的，像素會減小2，所以是11*11.
第二個池化層，會把輸入像素減半，即變成5*5 。
Flatten層的輸入爲什麼是1600？ 1600 = 5*5*64，即把64個子圖的所有像素點展平，作爲Flatten的輸入。

3. 使用卷積優化Fashion Mnist數據識別模型

之前的訓練代碼：

import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images / 255.0
test_images=test_images / 255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)

test_loss = model.evaluate(test_images, test_labels)

執行完成後，可以得到訓練集和測試集的誤差及正確率，如下：

Epoch 1/5
60000/60000 [==============================] - 20s 340us/sample - loss: 0.4971 - acc: 0.8251
Epoch 2/5
60000/60000 [==============================] - 18s 306us/sample - loss: 0.3769 - acc: 0.8643
Epoch 3/5
60000/60000 [==============================] - 18s 292us/sample - loss: 0.3389 - acc: 0.8759
Epoch 4/5
60000/60000 [==============================] - 17s 291us/sample - loss: 0.3154 - acc: 0.8850
Epoch 5/5
60000/60000 [==============================] - 14s 230us/sample - loss: 0.2971 - acc: 0.8921
10000/10000 [==============================] - 2s 151us/sample - loss: 0.3676 - acc: 0.8657

改進後的代碼，如下：

import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5)
test_loss = model.evaluate(test_images, test_labels)

其結果如下：

Epoch 1/5
60000/60000 [==============================] - 216s 4ms/sample - loss: 0.4520 - acc: 0.8356
Epoch 2/5
60000/60000 [==============================] - 196s 3ms/sample - loss: 0.2985 - acc: 0.8904
Epoch 3/5
60000/60000 [==============================] - 208s 3ms/sample - loss: 0.2525 - acc: 0.9070
Epoch 4/5
60000/60000 [==============================] - 211s 4ms/sample - loss: 0.2236 - acc: 0.9172
Epoch 5/5
60000/60000 [==============================] - 201s 3ms/sample - loss: 0.1953 - acc: 0.9265
10000/10000 [==============================] - 9s 935us/sample - loss: 0.2658 - acc: 0.9051

通過上面的代碼及其運行情況，可以得到：

使用優化後的代碼，會比之前的代碼運行更慢，因爲其會涉及把圖片進過兩次卷積和池化，並且進行64個圖片操作；
使用優化後的代碼後，明顯感覺到其在訓練集、驗證集的效果會更好。

4. 可視化卷積結果

本節主要是卷積結果的可視化，具體指的是：獲取上面的模型，然後同時選擇多個圖片，以及一個卷積（濾波器矩陣），看模型的四層（第一次卷積、第一次池化、第二次卷積、第二次池化）的輸出效果。

先查看測試數據的部分結果：

print(test_labels[:100])

輸出爲：

[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
 5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
 2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]

從上面的結果來看下標爲0,23，28的圖片都是9，也就是shoes，那麼一般情況下，卷積可以針對同一類的圖片，能發現其共同的特徵，使用下面代碼：

#%matplotlib
import matplotlib.pyplot as plt
f, axarr = plt.subplots(3,4)
FIRST_IMAGE=0 # 第一個圖片下標
SECOND_IMAGE=23# 第二個圖片下標
THIRD_IMAGE=28# 第三個圖片下標
CONVOLUTION_NUMBER = 1 # 第1個卷積 ,調整卷積爲1,2,6 ，可以看到不同圖片
from tensorflow.keras import models
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
for x in range(0,4):
    f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[0,x].grid(False)
    f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[1,x].grid(False)
    f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
    axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
    axarr[2,x].grid(False)

調整 CONVOLUTION_NUMBER分別爲1,2,6，即可看到如下所示圖片：

從圖片中可以看到：

圖片像素從最開始的28*28，變爲26*26， -> 13*13 , -> 11*11 ,-> 5*5;
明顯可以看出當CONVOLUTION_NUMBER爲2時，其特徵區分的最好（就3個對比來說）；

使用CONVOLUTION_NUMBER爲2，並且調整第二個圖片的下標爲2，那麼可以得到如下圖：

從圖上可以看出，使用CONVOLUTION_NUMBER爲2的濾波器矩陣，對圖片的分類效果比較好，能較明顯的區分鞋子和褲子。

5. 純Python理解卷積和池化

使用如下代碼，引入圖片：

#%matplotlib
import cv2
import numpy as np
from scipy import misc
i = misc.ascent()

import matplotlib.pyplot as plt
plt.grid(False)
plt.gray()
plt.axis('off')
plt.imshow(i)
plt.show()

圖片如下：

獲取圖片大小：

i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]

定義3*3 filter：

# This filter detects edges nicely
# It creates a convolution that only passes through sharp edges and straight
# lines.

#Experiment with different values for fun effects.
#filter = [ [0, 1, 0], [1, -4, 1], [0, 1, 0]]

# A couple more filters to try for fun!
filter = [ [-1, -2, -1], [0, 0, 0], [1, 2, 1]]
#filter = [ [-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]

# If all the digits in the filter don't add up to 0 or 1, you 
# should probably do a weight to get it to do so
# so, for example, if your weights are 1,1,1 1,2,1 1,1,1
# They add up to 10, so you would set a weight of .1 if you want to normalize them
weight  = 1

應用卷積：

for x in range(1,size_x-1):
  for y in range(1,size_y-1):
      convolution = 0.0
      convolution = convolution + (i[x - 1, y-1] * filter[0][0])
      convolution = convolution + (i[x, y-1] * filter[0][1])
      convolution = convolution + (i[x + 1, y-1] * filter[0][2])
      convolution = convolution + (i[x-1, y] * filter[1][0])
      convolution = convolution + (i[x, y] * filter[1][1])
      convolution = convolution + (i[x+1, y] * filter[1][2])
      convolution = convolution + (i[x-1, y+1] * filter[2][0])
      convolution = convolution + (i[x, y+1] * filter[2][1])
      convolution = convolution + (i[x+1, y+1] * filter[2][2])
      convolution = convolution * weight
      if(convolution<0):
        convolution=0
      if(convolution>255):
        convolution=255
      i_transformed[x, y] = convolution

查看卷積後的結果：

# Plot the image. Note the size of the axes -- they are 512 by 512
plt.gray()
plt.grid(False)
plt.imshow(i_transformed)
#plt.axis('off')
plt.show()

得到的圖如下：

應用池化：

new_x = int(size_x/2)
new_y = int(size_y/2)
newImage = np.zeros((new_x, new_y))
for x in range(0, size_x, 2):
    for y in range(0, size_y, 2):
        pixels = []
        pixels.append(i_transformed[x, y])
        pixels.append(i_transformed[x+1, y])
        pixels.append(i_transformed[x, y+1])
        pixels.append(i_transformed[x+1, y+1])
        pixels.sort(reverse=True)
        newImage[int(x/2),int(y/2)] = pixels[0]

# Plot the image. Note the size of the axes -- now 256 pixels instead of 512
plt.gray()
plt.grid(False)
plt.imshow(newImage)
#plt.axis('off')
plt.show()

查看結果：

從結果可以看出，圖片的一些特徵是被強化的。

6. 測驗：

第 1 個問題
What is a Convolution?

a. A technique to filter out unwanted images
b. A technique to make images bigger
c. A technique to make images smaller
d. A technique to isolate features in images

第 2 個問題
What is a Pooling?

a. A technique to isolate features in images
b. A technique to combine pictures
c. A technique to reduce the information in an image while maintaining features
d. A technique to make images sharper

第 3 個問題
How do Convolutions improve image recognition?

a. They make the image smaller
b. They make processing of images faster
c. They make the image clearer
d. They isolate features in images

第 4 個問題
After passing a 3x3 filter over a 28x28 image, how big will the output be?

a. 28x28
b. 25x25
c. 26x26
d. 31x31

第 5 個問題
After max pooling a 26x26 image with a 2x2 filter, how big will the output be?

a. 26x26
b. 28x28
c. 13x13
d. 56x56

第 6 個問題
Applying Convolutions on top of our Deep neural network will make training:

a. It depends on many factors. It might make your training faster or slower, and a poorly designed Convolutional layer may even be less efficient than a plain DNN!
b. Stay the same
c. Faster
d. Slower

 My Guess:
    1. d
    2. c
    3. d
    4.c
    5. c
    6. a

7. 額外練習：

針對MNIST數據構建的模型進行優化，同時滿足如下要求：

提升正確率到99.8%+；
只能使用一個卷積層和一個池化層；
當達到99.8%+的正確率後，立即停止訓練（正常情況下20步驟內可以達到），並打印Reached 99.8% accuracy so cancelling training!；

下面是提示代碼：

import tensorflow as tf

# YOUR CODE STARTS HERE

# YOUR CODE ENDS HERE

mnist = tf.keras.datasets.mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

# YOUR CODE STARTS HERE

# YOUR CODE ENDS HERE

model = tf.keras.models.Sequential([
    # YOUR CODE STARTS HERE

    # YOUR CODE ENDS HERE
])

# YOUR CODE STARTS HERE

# YOUR CODE ENDS HERE

Answer

Code Download Here

Coursera TensorFlow 基礎課程-week3

Enhancing Vision with Convolutional Neural Networks

1. convolutions and pooling （卷積和池化）

2.實現卷積和池化

3. 使用卷積優化Fashion Mnist數據識別模型

4. 可視化卷積結果

5. 純Python理解卷積和池化

6. 測驗：

7. 額外練習：

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

MapReduce實現線性迴歸

Spark TopK問題解法

Spark讀寫Hive添加PMML支持

Spark讀寫Hive

Coursera TensorFlow 基礎課程-week2

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結