吳恩達Coursera深度學習課程 deeplearning.ai (4-1) 卷積神經網絡--編程作業

Part 1:卷積神經網絡

本週課程將利用numpy實現卷積層(CONV) 和 池化層(POOL), 包含前向傳播和可選的反向傳播。

變量說明

  • 上標[l] 表示神經網絡的第幾層
  • 上標(i) 表示第幾個樣本
  • 上標[i] 表示第幾個mini-batch
  • 下標 i 表示向量的第幾個維度
  • nH , nW , nC 分別代表圖片的高,寬和通道數
  • nHprev , nWprev , nCprev 分別代表上一層圖片的高,寬和通道數

1 導包

import numpy as np # 科學計算的
import h5py # 讀取數據文件的
import matplotlib.pyplot as plt # 畫圖的

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1) # 使隨機函數一致的

2 作業大綱

  • 卷幾層
    • Zero Padding
    • Convolve window
    • Convolution forward
    • Convolution backward (optional)
  • 池化層
    • Pooling forward
    • Create mask
    • Distribute value
    • Pooling backward (optional)

本次作業我們採用numpy方式實現(需要實現反向傳播),之後的作業可以採用tensorflow實現

image

3 卷積層

實現如下卷幾層:將輸入轉化爲不同size的輸出。

image

3.1 Zero-Padding (填充0作爲Padding)

image

Padding 的好處

  • 不加Padding時每次卷積圖片都會縮小,加上Padding後圖片可以自由設置大小,比如保持不變的SAME模式
  • 保留圖片邊緣的信息,如果沒有Padding,邊緣信息作用的輸出非常少,會再一定程度上丟失信息

練習

用0位圖片填充Padding,以下代碼可以爲(5,5,5,5,5)的數組a添加Padding:爲第2維添加 pad =1,爲第4維添加 pad=3,其餘維度pad=0

a = np.pad(a, ((0,0), (1,1), (0,0), (3,3), (0,0)), 'constant', constant_values = (..,..))

代碼

# GRADED FUNCTION: zero_pad

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    as illustrated in Figure 1.

    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions

    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """

    ### START CODE HERE ### (≈ 1 line)
    X_pad = np.pad(X, ((0,0), (pad,pad), (pad,pad), (0,0)), 'constant')
    ### END CODE HERE ###

    return X_pad

##########################################

np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)
x_pad = zero_pad(x, 2)
print ("x.shape =", x.shape)
print ("x_pad.shape =", x_pad.shape)
print ("x[1,1] =", x[1,1])
print ("x_pad[1,1] =", x_pad[1,1])

fig, axarr = plt.subplots(1, 2)
axarr[0].set_title('x')
axarr[0].imshow(x[0,:,:,0])
axarr[1].set_title('x_pad')
axarr[1].imshow(x_pad[0,:,:,0])

# x.shape = (4, 3, 3, 2)
# x_pad.shape = (4, 7, 7, 2)
# x[1,1] = [[ 0.90085595 -0.68372786]
#  [-0.12289023 -0.93576943]
#  [-0.26788808  0.53035547]]
# x_pad[1,1] = [[ 0.  0.]
#  [ 0.  0.]
#  [ 0.  0.]
#  [ 0.  0.]
#  [ 0.  0.]
#  [ 0.  0.]
#  [ 0.  0.]]
# 
# <matplotlib.image.AxesImage at 0x7f1a576871d0>

3.2 單步卷積層

利用卷積核(filter)遍歷輸入(input)得到輸出(output)

image

  • 卷積核的計算:每次對應元素相乘再加和作爲輸出的一個元素,相當於WX
  • 真正的輸出:A = sigmoid(WX+b)也就是在卷積的基礎上加上偏移量再進行sigmoid非線性運算
# GRADED FUNCTION: conv_single_step

def conv_single_step(a_slice_prev, W, b):
    """
    Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation 
    of the previous layer.

    Arguments:
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)

    Returns:
    Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
    """

    ### START CODE HERE ### (≈ 2 lines of code)
    # Element-wise product between a_slice and W. Do not add the bias yet.
    s = a_slice_prev * W
    # Sum over all entries of the volume s.
    Z = np.sum(s)
    # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
    Z = Z + b
    ### END CODE HERE ###

    return Z

########################################

np.random.seed(1)
a_slice_prev = np.random.randn(4, 4, 3)
W = np.random.randn(4, 4, 3)
b = np.random.randn(1, 1, 1)

Z = conv_single_step(a_slice_prev, W, b)
print("Z =", Z)

# Z = [[[-6.99908945]]]

3.3 卷積神經網絡-前向傳播

運用多個卷積核(filter)處理輸出圖像,每個卷積核輸出一個2維圖像,多個卷積核輸出多個2維圖像作爲通道疊加在一起。

提示
  1. 選取圖像分片
a_slice_prev = a_prev[0:2,0:2,:]
  1. 選取分片之前,應該先定義分片的範圍(vert_start, vert_end, horiz_start, horiz_end)
    image

  2. 輸出圖片的大小

    nH=nHprevf+2padstrid+1nW=nWprevf+2padstrid+1nC=numoffilters
# GRADED FUNCTION: conv_forward

def conv_forward(A_prev, W, b, hparameters):
    """
    Implements the forward propagation for a convolution function

    Arguments:
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"

    Returns:
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    """

    ### START CODE HERE ###
    # Retrieve dimensions from A_prev's shape (≈1 line)  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve dimensions from W's shape (≈1 line)
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters" (≈2 lines)
    stride = hparameters["stride"]
    pad = hparameters["pad"]

    # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    n_H = int((n_H_prev - f + 2*pad) / stride + 1)
    n_W = int((n_W_prev - f + 2*pad) / stride + 1)

    # Initialize the output volume Z with zeros. (≈1 line)
    Z = np.zeros((m, n_H, n_W, n_C))

    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev, pad)

    for i in range(m):                               # loop over the batch of training examples
        a_prev_pad = A_prev_pad[i, :, :, :]                               # Select ith training example's padded activation
        for h in range(n_H):                           # loop over vertical axis of the output volume
            for w in range(n_W):                       # loop over horizontal axis of the output volume
                for c in range(n_C):                   # loop over channels (= #filters) of the output volume

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = stride * h
                    vert_end = vert_start + f
                    horiz_start = stride * w
                    horiz_end = horiz_start + f

                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, W[:, :, :, c], b[:, :, :, c])

    ### END CODE HERE ###

    # Making sure your output shape is correct
    assert(Z.shape == (m, n_H, n_W, n_C))

    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)

    return Z, cache

#########################################################3

np.random.seed(1)
A_prev = np.random.randn(10,4,4,3)
W = np.random.randn(2,2,3,8)
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 2,
               "stride": 2}

Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
print("Z's mean =", np.mean(Z))
print("Z[3,2,1] =", Z[3,2,1])
print("cache_conv[0][1][2][3] =", cache_conv[0][1][2][3])

# Z's mean = 0.0489952035289
# Z[3,2,1] = [-0.61490741 -6.7439236  -2.55153897  1.75698377  3.56208902  # 0.53036437
#   5.18531798  8.75898442]
# cache_conv[0][1][2][3] = [-0.20075807  0.18656139  0.41005165]

理論上,卷幾層還需要一個激活函數

# Convolve the window to get back one output neuron
Z[i, h, w, c] = ...
# Apply activation
A[i, h, w, c] = activation(Z[i, h, w, c])

4 池化層

池化層將圖片的寬/高進行壓縮(通道數保持不變),抽取局部像素的特徵作爲一個像素,也可以使算法對於圖片的變化具有魯棒性。

兩種池化的方式

  • 最大值池化:利用(f,f)的窗口對圖片進行運算,窗口元素取最大值
  • 平均值池化:利用(f,f)的窗口對圖片進行運算,窗口元素取平均值

池化層擁有一個參數f, 表示池化窗口的大小(f,f),但是沒有需要反向傳播時學習的參數。

image
image

4.1 池化層-前向傳播

實現 MAX-POOL 和 AVG-POOL

提示

池化層的輸出大小: 和卷積運算相比,每個通道單獨運算,通道數不變)

nH=nHprevf+2padstrid+1nW=nWprevf+2padstrid+1nC=nCprev
# GRADED FUNCTION: pool_forward

def pool_forward(A_prev, hparameters, mode = "max"):
    """
    Implements the forward pass of the pooling layer

    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """

    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]

    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev

    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              

    ### START CODE HERE ###
    for i in range(m):                         # loop over the training examples
        for h in range(n_H):                     # loop on the vertical axis of the output volume
            for w in range(n_W):                 # loop on the horizontal axis of the output volume
                for c in range (n_C):            # loop over the channels of the output volume

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f

                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]

                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)

    ### END CODE HERE ###

    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)

    # Making sure your output shape is correct
    assert(A.shape == (m, n_H, n_W, n_C))

    return A, cache

###################################

np.random.seed(1)
A_prev = np.random.randn(2, 4, 4, 3)
hparameters = {"stride" : 2, "f": 3}

A, cache = pool_forward(A_prev, hparameters)
print("mode = max")
print("A =", A)
print()
A, cache = pool_forward(A_prev, hparameters, mode = "average")
print("mode = average")
print("A =", A)

# mode = max
# A = [[[[ 1.74481176  0.86540763  1.13376944]]]
# 
# 
#  [[[ 1.13162939  1.51981682  2.18557541]]]]
# 
# mode = average
# A = [[[[ 0.02105773 -0.20328806 -0.40389855]]]
# 
# 
#  [[[-0.22154621  0.51716526  0.48155844]]]]

恭喜你!完成了卷積神經網絡的前向傳播,下面反向傳播的內容爲可選部分。

5 卷積神經網絡的反向傳播(可選)

卷積神經網絡的反向傳播比較複雜,大部分框架也都提供了反向傳播的功能,所以不做強制要求。

注意(在這裏):
  • A 表示的是每個位置WX+b 加和之後 的值 (這裏沒有加入sigmoid的非線性部分)
  • Z 表示輸出圖片某個位置(h,w)的WX+b的值
  • dA 表示整體輸出的A的梯度
  • dZ 表示輸出圖片某位置Z的梯度

5.1 卷積層的反向傳播

5.1.1 計算 dA

這是利用卷積核Wc和樣本數據計算dA的公式

dA+=h=0nHw=0nWWc×dZhw

其中,Wc是卷積核(filter), dZ_hw是第Z層卷積在輸出圖片中(h,w)處輸出代價函數的梯度常量。

每次都用同一個卷積核乘以不同位置的dZ,是因爲前向傳播時我們卷積核遍歷輸入圖片得到了最後的Z,所以在反向傳播計算dA時,我們需要將各個分片的梯度加起來。

da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]

5.1.2 計算 dW

這是計算dWc的公式,dWc是一個卷積核的梯度

dWc+=h=0nHw=0nWaslice×dZhw

其中a_{slice}表示用於得到激活函數Zij的輸入圖片的切片,所以需要將每個切片產生的梯度都加起來才能得到dW
dW[:,:,:,c] += a_slice * dZ[i, h, w, c]

5.1.3 計算 db

以下是計算db的公式

db=hwdZhw

這裏講各個位置的dZ相加得到db
db[:,:,:,c] += dZ[i, h, w, c]
練習

實現conv_backward方法,需要將所有的訓練樣本,卷積核,寬,高都相加起來,然後利用上面的三個公式計算梯度。

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function

    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()

    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """

    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache

    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']

    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape

    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))
    dW = np.zeros((f, f, n_C_prev, n_C))
    db = np.zeros((1, 1, 1, n_C))

    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)

    for i in range(m):                       # loop over the training examples

        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i, :, :, :]
        da_prev_pad = dA_prev_pad[i, :, :, :]

        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume

                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f

                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
                    dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
                    db[:,:,:,c] += dZ[i, h, w, c]

        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    ### END CODE HERE ###

    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))

    return dA_prev, dW, db

########################################

np.random.seed(1)
dA, dW, db = conv_backward(Z, cache_conv)
print("dA_mean =", np.mean(dA))
print("dW_mean =", np.mean(dW))
print("db_mean =", np.mean(db))

# dA_mean = 1.45243777754
# dW_mean = 1.72699145831
# db_mean = 7.83923256462

5.2 池化層-反向傳播

池化層不需要學習任何參數,僅需要將反向傳播中的梯度反向傳遞過去以備下游使用。

5.2.1 最大值池化-反向傳播

首先我們需要實現一個面具窗口,蓋住所有非Max的元素(0), 展示Max的元素(1)。

X=[1342]M=[0010]
練習

實現create_mask_from_window()

提示
  • np.max() 求出矩陣最大值
  • A = (X == x) 將會返回如下的A:
A[i,j] = True if X[i,j] = x
A[i,j] = False if X[i,j] != x

這裏不考慮矩陣中有多個最大值的情況

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.

    Arguments:
    x -- Array of shape (f, f)

    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """

    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###

    return mask

#####################################3

np.random.seed(1)
x = np.random.randn(2,3)
mask = create_mask_from_window(x)
print('x = ', x)
print("mask = ", mask)

# x =  [[ 1.62434536 -0.61175641 -0.52817175]
#  [-1.07296862  0.86540763 -2.3015387 ]]
# mask =  [[ True False False]
#  [False False False]]

爲什麼要跟蹤最大值在矩陣中的位置,因爲最大值影響了輸出,而反向傳播時梯度影響的應該是最大值位置的輸入,與其他位置的輸入無關。

5.2.2 平均值池化-反向傳播

與最大值池化不同,平均值池化中每個元素都以相同的權重作用到了輸出上,所以平均值池化的面具矩陣應該是將dZ均衡分佈的。例如(2,2)的矩陣面具爲:

dZ=1dZ=[1/41/41/41/4]
提示
average = dz / (n_H * n_W)
練習

實現dZ的均衡分佈

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape

    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz

    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """

    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape

    # Compute the value to distribute on the matrix (≈1 line)
    average = dz / (n_H * n_W)

    # Create a matrix where every entry is the "average" value (≈1 line)
    a = average * np.ones(shape)
    ### END CODE HERE ###

    return a

#####################################

a = distribute_value(2, (2,2))
print('distributed value =', a)

# distributed value = [[ 0.5  0.5]
#  [ 0.5  0.5]]

5.2.3 集成:池化層-反向傳播

實現具有max/average兩種模式的池化層反向傳播函數。

def pool_backward(dA, cache, mode = "max"):
    """
    Implements the backward pass of the pooling layer

    Arguments:
    dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
    cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

    Returns:
    dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
    """

    ### START CODE HERE ###

    # Retrieve information from cache (≈1 line)
    (A_prev, hparameters) = cache

    # Retrieve hyperparameters from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    f = hparameters['f']

    # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
    m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
    m, n_H, n_W, n_C = dA.shape

    # Initialize dA_prev with zeros (≈1 line)
    dA_prev = np.zeros(np.shape(A_prev))

    for i in range(m):                       # loop over the training examples

        # select training example from A_prev (≈1 line)
        a_prev = A_prev[i, :, :, :]

        for h in range(n_H):                   # loop on the vertical axis
            for w in range(n_W):               # loop on the horizontal axis
                for c in range(n_C):           # loop over the channels (depth)

                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f

                    # Compute the backward propagation in both modes.
                    if mode == "max":

                        # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                        a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]
                        # Create the mask from a_prev_slice (≈1 line)
                        mask = create_mask_from_window(a_prev_slice)
                        # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += np.multiply(mask, dA[i, h, w, c])


                    elif mode == "average":

                        # Get the value a from dA (≈1 line)
                        da = dA[i, h, w, c]
                        # Define the shape of the filter as fxf (≈1 line)
                        shape = (f, f)
                        # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)

    ### END CODE ###

    # Making sure your output shape is correct
    assert(dA_prev.shape == A_prev.shape)

    return dA_prev


######################################3

np.random.seed(1)
A_prev = np.random.randn(5, 5, 3, 2)
hparameters = {"stride" : 1, "f": 2}
A, cache = pool_forward(A_prev, hparameters)
dA = np.random.randn(5, 4, 2, 2)

dA_prev = pool_backward(dA, cache, mode = "max")
print("mode = max")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,1] = ', dA_prev[1,1])  
print()
dA_prev = pool_backward(dA, cache, mode = "average")
print("mode = average")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,1] = ', dA_prev[1,1]) 

# mode = max
# mean of dA =  0.145713902729
# dA_prev[1,1] =  [[ 0.          0.        ]
#  [ 5.05844394 -1.68282702]
#  [ 0.          0.        ]]

# mode = average
# mean of dA =  0.145713902729
# dA_prev[1,1] =  [[ 0.08485462  0.2787552 ]
#  [ 1.26461098 -0.25749373]
#  [ 1.17975636 -0.53624893]]

恭喜你!完成了這個作業,現在你已經掌握了卷積神經網絡的工作原理和步驟。接下來我們將通過TensorFlow框架實現ConvNet

Part 2:卷積神經網絡: 應用

利用TensorFlow實現ConvNet

0 TensorFlow 模型

導包

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

%matplotlib inline
np.random.seed(1)

導入數據(手勢)

# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

一共六種手勢:0-5
image

查看數據

# Example of a picture
index = 6
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))

# y = 2

之前在第二課的時候,我們採用全連接實現了手勢識別,不過圖片的分類自然更適合用卷積神經網絡了。

檢查數據

X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {}

# number of training examples = 1080
# number of test examples = 120
# X_train shape: (1080, 64, 64, 3)
# Y_train shape: (1080, 6)
# X_test shape: (120, 64, 64, 3)
# Y_test shape: (120, 6)

1 創建佔位符

爲輸入數據X和輸出數據Y創建佔位符

  • X (None, n_H0, n_W0, n_C0) : 第一個元素,尚未指定樣本數量
  • Y (None, n_y) : 第一個元素,尚未指定樣本數量
# GRADED FUNCTION: create_placeholders

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    """
    Creates the placeholders for the tensorflow session.

    Arguments:
    n_H0 -- scalar, height of an input image
    n_W0 -- scalar, width of an input image
    n_C0 -- scalar, number of channels of the input
    n_y -- scalar, number of classes

    Returns:
    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
    """

    ### START CODE HERE ### (≈2 lines)
    X = tf.placeholder(tf.float32, shape=[None, n_H0, n_W0, n_C0])
    Y = tf.placeholder(tf.float32, shape=[None, n_y])
    ### END CODE HERE ###

    return X, Y


#########################################

X, Y = create_placeholders(64, 64, 3, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))

# X = Tensor("Placeholder:0", shape=(?, 64, 64, 3), dtype=float32)
# Y = Tensor("Placeholder_1:0", shape=(?, 6), dtype=float32)

2 初始化參數

  • W1: 權重(weight)
  • W2: 卷積核(filter)

提示

在TensorFlow中爲大小爲[1,2,3,4]的W進行初始化

W = tf.get_variable("W", [1,2,3,4], initializer = ...)
# GRADED FUNCTION: initialize_parameters

def initialize_parameters():
    """
    Initializes weight parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [4, 4, 3, 8]
                        W2 : [2, 2, 8, 16]
    Returns:
    parameters -- a dictionary of tensors containing W1, W2
    """

    tf.set_random_seed(1)                              # so that your "random" numbers match ours

    ### START CODE HERE ### (approx. 2 lines of code)
    W1 = tf.get_variable("W1", [4, 4, 3, 8], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    W2 = tf.get_variable("W2", [2, 2, 8, 16], initializer=tf.contrib.layers.xavier_initializer(seed=0))
    ### END CODE HERE ###

    parameters = {"W1": W1,
                  "W2": W2}

    return parameters


###########################################3

tf.reset_default_graph()
with tf.Session() as sess_test:
    parameters = initialize_parameters()
    init = tf.global_variables_initializer()
    sess_test.run(init)
    print("W1 = " + str(parameters["W1"].eval()[1,1,1]))
    print("W2 = " + str(parameters["W2"].eval()[1,1,1]))


# W1 = [ 0.00131723  0.14176141 -0.04434952  0.09197326  0.14984085 -0.03514394
#  -0.06847463  0.05245192]
# W2 = [-0.08566415  0.17750949  0.11974221  0.16773748 -0.0830943  -0.08058
#  -0.00577033 -0.14643836  0.24162132 -0.05857408 -0.19055021  0.1345228
#  -0.22779644 -0.1601823  -0.16117483 -0.10286498]

3 前向傳播

以下幾個TensorFlow內置的方法可以幫你執行卷積的步驟

  • tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’)
    • 第一個參數:輸入input
    • 第二個參數:多個卷積核(filters)
    • 第三個參數:輸入的維度[m, n_H_prev, n_W_prev, n_C_prev]
  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’)
    • 第一個參數:池化層的輸入(也就是卷幾層的輸出)
    • 第二個參數:窗口維度 [m, f, f, n_C_prev]
    • 第三個參數:步幅維度 [m, s, s, n_C_prev]
    • 第四個參數:padding模式
  • tf.nn.relu(Z1)
    • 計算Z1的Relu: 各個元素分別計算
  • tf.contrib.layers.flatten(P)
    • P[batchSize, …]爲輸入,batchSize爲樣本個數,將向量(除第一維)展開爲一維向量,最後爲二維向量 [batch_size, k]
  • tf.contrib.layers.fully_connected(F, num_outputs)
    • 將輸入的flatten向量運用全連接層到輸出層,輸出層節點個數爲num_outputs,中間的各個參數由框架自己初始化和訓練學習

練習

實現前向傳播:CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

步驟如下:

  • Conv2D: stride=1, padding=“SAME”
  • ReLU
  • Max pool: filter=[1,8,8,1], stride=[1,8,8,1] , padding=“SAME”
  • Conv2D: stride=1, padding=“SAME”
  • ReLU
  • Max pool: filter=[1,4,4,1], stride=[1,4,4,1] , padding=“SAME”
  • Flatten
  • FULLYCONNECTED (FC) 全連接層
    • 注意全連接層並沒有加非線性部分(比如softmax),這是因爲在TensorFlow框架中,非線性部分將在計算cost時調用,而不是這裏
# GRADED FUNCTION: forward_propagation

def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model:
    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "W2"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """

    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    W2 = parameters['W2']

    ### START CODE HERE ###
    # CONV2D: stride of 1, padding 'SAME'
    Z1 = tf.nn.conv2d(X, W1, strides=[1,1,1,1], padding='SAME')
    # RELU
    A1 = tf.nn.relu(Z1)
    # MAXPOOL: window 8x8, sride 8, padding 'SAME'
    P1 = tf.nn.max_pool(A1, ksize=[1,8,8,1], strides=[1,8,8,1], padding='SAME')
    # CONV2D: filters W2, stride 1, padding 'SAME'
    Z2 = tf.nn.conv2d(P1, W2, strides=[1,1,1,1], padding='SAME')
    # RELU
    A2 = tf.nn.relu(Z2)
    # MAXPOOL: window 4x4, stride 4, padding 'SAME'
    P2 = tf.nn.max_pool(A2, ksize=[1,4,4,1], strides=[1,4,4,1], padding='SAME')
    # FLATTEN
    P2 = tf.contrib.layers.flatten(P2)
    # FULLY-CONNECTED without non-linear activation function (not not call softmax).
    # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None" 
    Z3 = tf.contrib.layers.fully_connected(P2, 6, activation_fn=None)
    ### END CODE HERE ###

    return Z3


##########################################

tf.reset_default_graph()

with tf.Session() as sess:
    np.random.seed(1)
    X, Y = create_placeholders(64, 64, 3, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    init = tf.global_variables_initializer()
    sess.run(init)
    a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,6)})
    print("Z3 = " + str(a))

# Z3 = [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376  # 0.46852064]
#  [-0.17601591 -1.57972014 -1.4737016  -2.61672091 -1.00810647  0.5747785 ]]

3 計算代價函數 cost

提示

  • tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)
    • 計算softmax層次的損失函數,包含輸出的非線性部分和結果的損失計算
  • tf.reduce_mean
    • 跨維度計算各個元素的均值
# GRADED FUNCTION: compute_cost 

def compute_cost(Z3, Y):
    """
    Computes the cost

    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3

    Returns:
    cost - Tensor of the cost function
    """

    ### START CODE HERE ### (1 line of code)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))
    ### END CODE HERE ###

    return cost


######################################3

tf.reset_default_graph()

with tf.Session() as sess:
    np.random.seed(1)
    X, Y = create_placeholders(64, 64, 3, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    init = tf.global_variables_initializer()
    sess.run(init)
    a = sess.run(cost, {X: np.random.randn(4,64,64,3), Y: np.random.randn(4,6)})
    print("cost = " + str(a))

# cost = 2.91034

4 模型

集成上述各部分來建立模型,在手勢數據集上進行訓練。

完成下列方法:

  • create placeholders
  • initialize parameters
  • forward propagate
  • compute the cost
  • create an optimizer

最後我們創建session並在每個epoch和mini-batch上執行模型。

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
          num_epochs = 100, minibatch_size = 64, print_cost = True):
    """
    Implements a three-layer ConvNet in Tensorflow:
    CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED

    Arguments:
    X_train -- training set, of shape (None, 64, 64, 3)
    Y_train -- test set, of shape (None, n_y = 6)
    X_test -- training set, of shape (None, 64, 64, 3)
    Y_test -- test set, of shape (None, n_y = 6)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs

    Returns:
    train_accuracy -- real number, accuracy on the train set (X_train)
    test_accuracy -- real number, testing accuracy on the test set (X_test)
    parameters -- parameters learnt by the model. They can then be used to predict.
    """

    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)
    seed = 3                                          # to keep results consistent (numpy seed)
    (m, n_H0, n_W0, n_C0) = X_train.shape             
    n_y = Y_train.shape[1]                            
    costs = []                                        # To keep track of the cost

    # Create Placeholders of the correct shape
    ### START CODE HERE ### (1 line)
    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)
    ### END CODE HERE ###

    # Initialize parameters
    ### START CODE HERE ### (1 line)
    parameters = initialize_parameters()
    ### END CODE HERE ###

    # Forward propagation: Build the forward propagation in the tensorflow graph
    ### START CODE HERE ### (1 line)
    Z3 = forward_propagation(X, parameters)
    ### END CODE HERE ###

    # Cost function: Add cost function to tensorflow graph
    ### START CODE HERE ### (1 line)
    cost = compute_cost(Z3, Y)
    ### END CODE HERE ###

    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
    ### START CODE HERE ### (1 line)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    ### END CODE HERE ###

    # Initialize all the variables globally
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:

        # Run the initialization
        sess.run(init)

        # Do the training loop
        for epoch in range(num_epochs):

            minibatch_cost = 0.
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).
                ### START CODE HERE ### (1 line)
                _ , temp_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                ### END CODE HERE ###

                minibatch_cost += temp_cost / num_minibatches


            # Print the cost every epoch
            if print_cost == True and epoch % 5 == 0:
                print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
            if print_cost == True and epoch % 1 == 0:
                costs.append(minibatch_cost)


        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # Calculate the correct predictions
        predict_op = tf.argmax(Z3, 1)
        correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print(accuracy)
        train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
        test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
        print("Train Accuracy:", train_accuracy)
        print("Test Accuracy:", test_accuracy)

        return train_accuracy, test_accuracy, parameters


############################################

_, _, parameters = model(X_train, Y_train, X_test, Y_test)

# Cost after epoch 0: 1.917929
# Cost after epoch 5: 1.506757
# Cost after epoch 10: 0.955359
# Cost after epoch 15: 0.845802
# Cost after epoch 20: 0.701174
# Cost after epoch 25: 0.571977
# Cost after epoch 30: 0.518435
# Cost after epoch 35: 0.495806
# Cost after epoch 40: 0.429827
# Cost after epoch 45: 0.407291
# Cost after epoch 50: 0.366394
# Cost after epoch 55: 0.376922
# Cost after epoch 60: 0.299491
# Cost after epoch 65: 0.338870
# Cost after epoch 70: 0.316400
# Cost after epoch 75: 0.310413
# Cost after epoch 80: 0.249549
# Cost after epoch 85: 0.243457
# Cost after epoch 90: 0.200031
# Cost after epoch 95: 0.175452

# Tensor("Mean_1:0", shape=(), dtype=float32)
# Train Accuracy: 0.940741
# Test Accuracy: 0.783333

思考

你可以試着識別”贊”的手勢

fname = "images/thumbs_up.jpg"
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64,64))
plt.imshow(my_image)

image

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章