基於keras的深度學習基本概念講解——深度學習之從小白到入門頂原薦

Author: shikanon CreateTime: 2017-02-13 10:33:34

Tensorflow1.0正式發佈，谷歌首屆Tensorflow開發者大會在山景召開，深度學習迎來新的高潮和狂歡。隨着深度學習框架的普及和推廣，會有越來越多人加入到這場盛宴中來，就像Android技術的普及使得開發人員迅速擴大。在這裏給大家帶來一套小白入門深度學習的基礎教程，使用得是Keras，一個高級神經網絡庫，同時也是Tensorflow1.0引進的一個高層API。

一、基礎篇

神經網絡中的每個神經元 對其所有的輸入進行加權求和，並添加一個被稱爲偏置（bias） 的常數，然後通過一些非線性激活函數來反饋結果。

數據集我們採用深度學習界的Hello-Word———— MNIST手寫數字數據集，學習從第一個softmax開始。源碼見我的github，可以直接通過nbviewer下載。

1. softmax

softmax主要用來做多分類問題，是logistic迴歸模型在多分類問題上的推廣，softmax 公式：

當k=2時，轉換爲邏輯迴歸形式。

softmax一般作爲神經網絡最後一層，作爲輸出層進行多分類，Softmax的輸出的每個值都是>=0，並且其總和爲1，所以可以認爲其爲概率分佈。

softmax 示意圖

softmax 輸出層示意圖

%pylab inline

Populating the interactive namespace from numpy and matplotlib

from IPython.display import SVG
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Reshape
from keras.optimizers import SGD, Adam
from keras.utils.visualize_util import model_to_dot
from keras.utils import np_utils
import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd

Using TensorFlow backend.

#設置隨機數種子,保證實驗可重複
import numpy as np
np.random.seed(0)
#設置線程
THREADS_NUM = 20
tf.ConfigProto(intra_op_parallelism_threads=THREADS_NUM)

(X_train, Y_train),(X_test, Y_test) = mnist.load_data()
print('原數據結構：')
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)

#數據變換
#分爲10個類別
nb_classes = 10

x_train_1 = X_train.reshape(60000, 784)
#x_train_1 /= 255
#x_train_1 = x_train_1.astype('float32')
y_train_1 = np_utils.to_categorical(Y_train, nb_classes)
print('變換後的數據結構：')
print(x_train_1.shape, y_train_1.shape)

x_test_1 = X_test.reshape(10000, 784)
y_test_1 = np_utils.to_categorical(Y_test, nb_classes)
print(x_test_1.shape, y_test_1.shape)

原數據結構：
((60000, 28, 28), (60000,))
((10000, 28, 28), (10000,))
變換後的數據結構：
((60000, 784), (60000, 10))
((10000, 784), (10000, 10))

# 構建一個softmax模型
# neural network with 1 layer of 10 softmax neurons
#
# · · · · · · · · · ·       (input data, flattened pixels)       X [batch, 784]        # 784 = 28 * 28
# \x/x\x/x\x/x\x/x\x/    -- fully connected layer (softmax)      W [784, 10]     b[10]
#   · · · · · · · ·                                              Y [batch, 10]

# The model is:
#
# Y = softmax( X * W + b)
#              X: matrix for 100 grayscale images of 28x28 pixels, flattened (there are 100 images in a mini-batch)
#              W: weight matrix with 784 lines and 10 columns
#              b: bias vector with 10 dimensions
#              +: add with broadcasting: adds the vector to each line of the matrix (numpy)
#              softmax(matrix) applies softmax on each line
#              softmax(line) applies an exp to each value then divides by the norm of the resulting line
#              Y: output matrix with 100 lines and 10 columns

model = Sequential()
model.add(Dense(nb_classes, input_shape=(784,)))#全連接，輸入784維度, 輸出10維度，需要和輸入輸出對應
model.add(Activation('softmax'))

sgd = SGD(lr=0.005)
#binary_crossentropy，就是交叉熵函數
model.compile(loss='binary_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

#model 概要
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_1 (Dense)                  (None, 10)            7850        dense_input_1[0][0]              
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 10)            0           dense_1[0][0]                    
====================================================================================================
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
____________________________________________________________________________________________________

SVG(model_to_dot(model).create(prog='dot', format='svg'))

from keras.callbacks import Callback, TensorBoard
import tensorflow as tf

#構建一個記錄的loss的回調函數
class LossHistory(Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

# 構建一個自定義的TensorBoard類，專門用來記錄batch中的數據變化
class BatchTensorBoard(TensorBoard):
    def __init__(self,log_dir='./logs',
                 histogram_freq=0,
                 write_graph=True,
                 write_images=False):
        super(BatchTensorBoard, self).__init__()
        self.log_dir = log_dir
        self.histogram_freq = histogram_freq
        self.merged = None
        self.write_graph = write_graph
        self.write_images = write_images
        self.batch = 0
        self.batch_queue = set()
    
    def on_epoch_end(self, epoch, logs=None):
        pass
    
    def on_batch_end(self,batch,logs=None):
        logs = logs or {}
        
        self.batch = self.batch + 1
        
        for name, value in logs.items():
            if name in ['batch', 'size']:
                continue
            summary = tf.Summary()
            summary_value = summary.value.add()
            summary_value.simple_value = float(value)
            summary_value.tag = "batch_" + name
            if (name,self.batch) in self.batch_queue:
                continue
            self.writer.add_summary(summary, self.batch)
            self.batch_queue.add((name,self.batch))
        self.writer.flush()

tensorboard = TensorBoard(log_dir='/home/tensorflow/log/softmax/epoch')
my_tensorboard = BatchTensorBoard(log_dir='/home/tensorflow/log/softmax/batch')

model.fit(x_train_1, y_train_1,
          nb_epoch=20,
          verbose=0,
          batch_size=100,
          callbacks=[tensorboard, my_tensorboard])

損失函數

損失函數（loss function），是指一種將一個事件（在一個樣本空間中的一個元素）映射到一個表達與其事件相關的經濟成本或機會成本的實數上的一種函數，在統計學中損失函數是一種衡量損失和錯誤（這種損失與“錯誤地”估計有關，如費用或者設備的損失）程度的函數。

**交叉熵（cross-entropy）**就是神經網絡中常用的損失函數。

交叉熵性質：

（1）非負性。

（2）當真實輸出a與期望輸出y接近的時候，代價函數接近於0.(比如y=0，a～0；y=1，a~1時，代價函數都接近0)。

一個比較簡單的理解就是使得預測值Yi和真實值Y' 對接近，即兩者的乘積越大，coss-entropy越小。

交叉熵和準確度變化圖像可以看 TensorBoard 。

梯度下降

如果對於所有的權重和所有的偏置計算交叉熵的偏導數，就得到一個對於給定圖像、標籤和當前權重和偏置的「梯度」，如圖所示：

我們希望損失函數最小，也就是需要到達交叉熵最小的凹點的低部。在上圖中，交叉熵被表示爲一個具有兩個權重的函數。

而學習速率，即在梯度下降中的步伐大小。

#模型的測試誤差指標
print(model.metrics_names)
# 對測試數據進行測試
model.evaluate(x_test_1, y_test_1,
          verbose=1,
          batch_size=100)

['loss', 'acc']
 9800/10000 [============================>.] - ETA: 0s




[0.87580669939517974, 0.94387999653816224]

上面，我們探索了softmax對多分類的支持和理解，知道softmax可以作爲一個輸出成層進行多分類任務。

但是，這種分類任務解決的都是線性因素形成的問題，對於非線性的，特別是異或問題，如何解決呢？

這時，一種包含多層隱含層的深度神經網絡的概念被提出。

3. 激活函數

**激活函數（activation function）**可以使得模型加入非線性因素的。

解決非線性問題有兩個辦法：線性變換、引入非線性函數。

（1）線性變換(linear transformation)

原本一個線性不可分的模型如：X^2 + Y^2 = 1

其圖形如下圖所示：

fig = plt.figure(0)
degree = np.random.rand(50)*np.pi*2
x_1 = np.cos(degree)*np.random.rand(50)
y_1 = np.sin(degree)*np.random.rand(50)
x_2 = np.cos(degree)*(1+np.random.rand(50))
y_2 = np.sin(degree)*(1+np.random.rand(50))

# x_3 和 y_3 就是切分線
t = np.linspace(0,np.pi*2,50)
x_3 = np.cos(t)
y_3 = np.sin(t)

scatter(x_1,y_1,c='red',s=50,alpha=0.4,marker='o')
scatter(x_2,y_2,c='black',s=50,alpha=0.4,marker='o')
plot(x_3,y_3)

將座標軸進行高維變換，橫座標變成X^2，縱座標變成 Y^2，這是表達式變爲了 X + Y = 1，這樣，原來的非線性問題，就變成了一個線性可分的問題，變成了一個簡單的一元一次方程了。

詳細可以參見下圖：

fig2 = plt.figure(1)
#令新的橫座標變成x^2,縱座標變成 Y^2
x_4 = x_1**2
y_4 = y_1**2
x_5 = x_2**2
y_5 = y_2**2

# 這樣就可以構建一個一元線性方程進行擬合
x_6 = np.linspace(-1,2,50)
y_6 = 1 - x_6

scatter(x_4,y_4,c='red',s=50,alpha=0.4,marker='o')
scatter(x_5,y_5,c='black',s=50,alpha=0.4,marker='o')
plot(x_6,y_6)

（2）引入非線性函數

異或是一種基於二進制的位運算，用符號XOR 表示(Python中的異或操作符爲 ^ )，其運算法則是對運算符兩側數的每一個二進制位，同值取0，異值取1。

下面是一個典型的異或表：

table = {'x':[1,0,1,0],'y':[1,0,0,1]}
df = pd.DataFrame(table)
df['z'] = df['x']^df['y']
df

x = 1, y = 1, 則 z = 0

x = 0, y = 0, 則 z = 0

x = 1, y = 0, 則 z = 1

x = 0, y = 1, 則 z = 1

...

其圖形如下：

fig3 = plt.figure(2)
groups = df.groupby('z')
for name, group in groups:
    scatter(group['x'],group['y'],label=name,s=50,marker='o')

那麼如果可以構建一個函數擬合這樣的圖形呢？即如何構建一個f()，使得：f(x,y)=z呢？

爲了解決問題，我們來構建一個兩層的神經網絡，該神經網絡有兩個激活函數，F(x,y)和 H(x,y), 具體如下圖所示：

F(x,y)爲一個閾值爲1的閾值函數：

即：當AX+BY>1時候,F(x,y) = 1;否則爲0；

if AX+BY > 1:
    F = 1
else:
    F = 0

H(x,y）爲一個閾值爲0的閾值函數：

if AX+BY > 0:
    H = 1
else:
    H = 0

圖中線的數字表示權重值，

- 對於(1,1)的點，第二層從左到右隱藏層的值分別爲(1,1,1),最後輸出爲(1,1,1)*(1,-2,1)=0；

- 對於(0,0)的點，第二層從左到右隱藏層的值分別爲(0,0,0),最後輸出爲(0,0,0)*(1,-2,1)=0；

- 對於(1,0)的點，第二層從左到右隱藏層的值分別爲(1,0,0),最後輸出爲(1,0,0)*(1,-2,1)= 1；

- 對於(0,1)的點，第二層從左到右隱藏層的值分別爲(0,0,1),最後輸出爲(0,0,1)*(1,-2,1)= 1；```



```python
first_hidder_layer_table = {'x':[1,0,1,0],'y':[1,0,0,0],'z':[1,0,0,1],'output':[0,0,1,1]}
first_hidder_layer_data = pd.DataFrame(first_hidder_layer_table)
first_hidder_layer_data

這樣我們就構建出了一個可以計算擬合的函數了。

我們觀察一下第一個隱含層，其總共有三個維度，三個權重值，從輸入層到第一層，實際上，就是從將一個二維的數組變成一個三維數組，從而實現線性切分。

圖形化解釋：

from mpl_toolkits.mplot3d import Axes3D
fig4 = plt.figure(3)
ax = fig4.add_subplot(111, projection='3d')
groups = first_hidder_layer_data.groupby('output')
for name, group in groups:
    ax.scatter(group['x'],group['y'],group['z'],label=name,c=np.random.choice(['black','blue']),s=50,marker='o')
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

經過變換後的數據是線性可分的（n維，比如本例中可以用平面將兩個不同顏色的點切分）

更多的操作可以參考tensorflow提供的一個神經網絡的網頁小程序，通過自己調整程序參數可以更深刻理解神經網絡、激活函數的作用。

演示網址：

http://playground.tensorflow.org/

可以自己建立一個小型神經網絡幫助理解。

4. sigmoid

sigmoid是一個用來做二分類的"S"形邏輯迴歸曲線

sigmoid公式：

sigmoid圖像：

其抑制兩頭,對中間細微變化敏感，因此sigmoid函數作爲最簡單常用的神經網絡激活層被使用。

優點：

（1）輸出範圍(0,1)，數據在傳遞的過程中不容易發散

（2）單向遞增

（3）易求導

sigmod有個缺點，sigmoid函數反向傳播時，很容易就會出現梯度消失,在接近飽和區的時候，導數趨向0，會變得非常緩慢。因此，在優化器選擇時選用Adam優化器。

Adam 也是基於梯度下降的方法，但是每次迭代參數的學習步長都有一個確定的範圍，不會因爲很大的梯度導致很大的學習步長，參數的值比較穩定。有利於降低模型收斂到局部最優的風險，而SGD容易收斂到局部最優，如果下面代碼中的optimizer改成SGD的化，在一次epoch後就acc值不會改變了，陷入局部最優


# 構建一個五層sigmod全連接神經網絡
# neural network with 5 layers
#
# · · · · · · · · · ·       (input data, flattened pixels)       X [batch, 784]   # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/    -- fully connected layer (sigmoid)      W1 [784, 200]      B1[200]
#  · · · · · · · · ·                                             Y1 [batch, 200]
#   \x/x\x/x\x/x\x/      -- fully connected layer (sigmoid)      W2 [200, 100]      B2[100]
#    · · · · · · ·                                               Y2 [batch, 100]
#    \x/x\x/x\x/         -- fully connected layer (sigmoid)      W3 [100, 60]       B3[60]
#     · · · · ·                                                  Y3 [batch, 60]
#     \x/x\x/            -- fully connected layer (sigmoid)      W4 [60, 30]        B4[30]
#      · · ·                                                     Y4 [batch, 30]
#      \x/               -- fully connected layer (softmax)      W5 [30, 10]        B5[10]
#       ·                                                        Y5 [batch, 10]

model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接，輸入784維度, 輸出10維度，需要和輸入輸出對應
model.add(Activation('sigmoid'))
model.add(Dense(100))# 除了首層需要設置輸入維度，其他層只需要輸入輸出維度就可以了，輸入維度自動繼承上層。
model.add(Activation('sigmoid'))
model.add(Dense(60))
model.add(Activation('sigmoid'))
model.add(Dense(30))            #model.add(Activation('sigmoid'))和model.add(Dense(30))可以合併寫出
model.add(Activation('sigmoid'))#model.add(Dense(30,activation='softmax'))
model.add(Dense(10))
model.add(Activation('softmax'))

sgd = Adam(lr=0.003)
model.compile(loss='binary_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

#model 概要
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_23 (Dense)                 (None, 200)           157000      dense_input_7[0][0]              
____________________________________________________________________________________________________
activation_23 (Activation)       (None, 200)           0           dense_23[0][0]                   
____________________________________________________________________________________________________
dense_24 (Dense)                 (None, 100)           20100       activation_23[0][0]              
____________________________________________________________________________________________________
activation_24 (Activation)       (None, 100)           0           dense_24[0][0]                   
____________________________________________________________________________________________________
dense_25 (Dense)                 (None, 60)            6060        activation_24[0][0]              
____________________________________________________________________________________________________
activation_25 (Activation)       (None, 60)            0           dense_25[0][0]                   
____________________________________________________________________________________________________
dense_26 (Dense)                 (None, 30)            1830        activation_25[0][0]              
____________________________________________________________________________________________________
activation_26 (Activation)       (None, 30)            0           dense_26[0][0]                   
____________________________________________________________________________________________________
dense_27 (Dense)                 (None, 10)            310         activation_26[0][0]              
____________________________________________________________________________________________________
activation_27 (Activation)       (None, 10)            0           dense_27[0][0]                   
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________

SVG(model_to_dot(model).create(prog='dot', format='svg'))

tensorboard2 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_sigmoid/epoch', histogram_freq=0)
my_tensorboard2 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_sigmoid/batch')
model.fit(x_train_1, y_train_1,
          nb_epoch=20,
          verbose=0,
          batch_size=100,
          callbacks=[my_tensorboard2, tensorboard2])

<keras.callbacks.History at 0xf868a90>

#模型的測試誤差指標
print(model.metrics_names)
# 對測試數據進行測試
model.evaluate(x_test_1, y_test_1,
          verbose=1,
          batch_size=100)

['loss', 'acc']
 9800/10000 [============================>.] - ETA: 0s

[0.036339853547979147, 0.98736999988555907]

根據上面，我們可以看出，深度越深，效果越好。

但是，對於深層網絡，sigmoid函數反向傳播時，很容易就會出現梯度消失的情況從而無法完成深層網絡的訓練。在sigmoid接近飽和區時，變換非常緩慢，導數趨於0，減緩收斂速度。

5. ReLu

ReLu來自於對人腦神經細胞工作時的稀疏性的研究，在 Lennie,P.(2003)提出人腦神經元有95%－99%是閒置的，而更少工作的神經元意味着更小的計算複雜度，更不容易過擬合

修正線性單元(Rectified linear unit,ReLU)公式：

其圖像：

ReLU具有線性、非飽和性，而其非飽和性使得網絡可以自行引入稀疏性。

ReLU的使用解決了sigmoid梯度下降慢，深層網絡的信息丟失的問題。

ReLU在訓練時是非常脆弱的，並且可能會“死”。例如，經過ReLU神經元的一個大梯度可能導致權重更新後該神經元接收到任何數據點都不會再激活。如果發生這種情況，之後通過該單位點的梯度將永遠是零。也就是說，ReLU可能會在訓練過程中不可逆地死亡，並且破壞數據流形。如果學習率太高，大部分網絡將會“死亡”（即，在整個訓練過程中神經元都沒有激活）。而設置一個適當的學習率，可以在一定程度上避免這一問題。

6. 學習速率

上面說梯度下降的時候，說過學習速率其實就是梯度下降的步伐。因此，爲了到達山谷，需要控制步伐的大小，即學習速率。

學習速率大小的調節一般取決於 loss 的變化幅度。

# neural network with 5 layers
#
# · · · · · · · · · ·       (input data, flattened pixels)       X [batch, 784]   # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/    -- fully connected layer (relu)         W1 [784, 200]      B1[200]
#  · · · · · · · · ·                                             Y1 [batch, 200]
#   \x/x\x/x\x/x\x/      -- fully connected layer (relu)         W2 [200, 100]      B2[100]
#    · · · · · · ·                                               Y2 [batch, 100]
#    \x/x\x/x\x/         -- fully connected layer (relu)         W3 [100, 60]       B3[60]
#     · · · · ·                                                  Y3 [batch, 60]
#     \x/x\x/            -- fully connected layer (relu)         W4 [60, 30]        B4[30]
#      · · ·                                                     Y4 [batch, 30]
#      \x/               -- fully connected layer (softmax)      W5 [30, 10]        B5[10]
#       ·                                                        Y5 [batch, 10]
model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接，輸入784維度, 輸出10維度，需要和輸入輸出對應
model.add(Activation('relu'))# 將激活函數sigmoid改爲ReLU
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(60))
model.add(Activation('relu'))
model.add(Dense(30))            
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

sgd = Adam(lr=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

#model 概要
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_16 (Dense)                 (None, 200)           157000      dense_input_4[0][0]              
____________________________________________________________________________________________________
activation_16 (Activation)       (None, 200)           0           dense_16[0][0]                   
____________________________________________________________________________________________________
dense_17 (Dense)                 (None, 100)           20100       activation_16[0][0]              
____________________________________________________________________________________________________
activation_17 (Activation)       (None, 100)           0           dense_17[0][0]                   
____________________________________________________________________________________________________
dense_18 (Dense)                 (None, 60)            6060        activation_17[0][0]              
____________________________________________________________________________________________________
activation_18 (Activation)       (None, 60)            0           dense_18[0][0]                   
____________________________________________________________________________________________________
dense_19 (Dense)                 (None, 30)            1830        activation_18[0][0]              
____________________________________________________________________________________________________
activation_19 (Activation)       (None, 30)            0           dense_19[0][0]                   
____________________________________________________________________________________________________
dense_20 (Dense)                 (None, 10)            310         activation_19[0][0]              
____________________________________________________________________________________________________
activation_20 (Activation)       (None, 10)            0           dense_20[0][0]                   
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________

SVG(model_to_dot(model).create(prog='dot', format='svg'))

tensorboard3 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_relu/epoch', histogram_freq=0)
my_tensorboard3 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_relu/batch')
model.fit(x_train_1, y_train_1,
          nb_epoch=30,
          verbose=0,
          batch_size=100,
          callbacks=[my_tensorboard3, tensorboard3])

<keras.callbacks.History at 0xe3c6d50>

#模型的測試誤差指標
print(model.metrics_names)
# 對測試數據進行測試
model.evaluate(x_test_1, y_test_1,
          verbose=1,
          batch_size=100)

['loss', 'acc']
 9600/10000 [===========================>..] - ETA: 0s




[0.017244604945910281, 0.99598000288009647]

7.Dropout

運行目錄下的mnist_2.1_five_layers_relu_lrdecay.py

隨着迭代次數的增加，我們可以發現測試數據的loss值和訓練數據的loss存在着巨大的差距，隨着迭代次數增加，train loss 越來越好，但test loss 的結果確越來越差，test loss 和 train loss 差距越來越大，模型開始過擬合。

Dropout是指對於神經網絡單元按照一定的概率將其暫時從網絡中丟棄,從而解決過擬合問題。

可以對比mnist_2.1_five_layers_relu_lrdecay.py 和加了dropout的/mnist_2.2_five_layers_relu_lrdecay_dropout.py的結果

# neural network with 5 layers
#
# · · · · · · · · · ·       (input data, flattened pixels)       X [batch, 784]   # 784 = 28*28
# \x/x\x/x\x/x\x/x\x/ ✞  -- fully connected layer (relu+dropout) W1 [784, 200]      B1[200]
#  · · · · · · · · ·                                             Y1 [batch, 200]
#   \x/x\x/x\x/x\x/ ✞    -- fully connected layer (relu+dropout) W2 [200, 100]      B2[100]
#    · · · · · · ·                                               Y2 [batch, 100]
#    \x/x\x/x\x/ ✞       -- fully connected layer (relu+dropout) W3 [100, 60]       B3[60]
#     · · · · ·                                                  Y3 [batch, 60]
#     \x/x\x/ ✞          -- fully connected layer (relu+dropout) W4 [60, 30]        B4[30]
#      · · ·                                                     Y4 [batch, 30]
#      \x/               -- fully connected layer (softmax)      W5 [30, 10]        B5[10]
#       ·                                                        Y5 [batch, 10]
model = Sequential()
model.add(Dense(200, input_shape=(784,)))#全連接，輸入784維度, 輸出10維度，需要和輸入輸出對應
model.add(Activation('relu'))# 將激活函數sigmoid改爲ReLU
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.25))# 添加一個dropout層, 隨機移除25%的單元
model.add(Dense(60))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(30))            
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(10))
model.add(Activation('softmax'))

sgd = Adam(lr=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

#model 概要
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_171 (Dense)                (None, 200)           157000      dense_input_35[0][0]             
____________________________________________________________________________________________________
activation_171 (Activation)      (None, 200)           0           dense_171[0][0]                  
____________________________________________________________________________________________________
dense_172 (Dense)                (None, 100)           20100       activation_171[0][0]             
____________________________________________________________________________________________________
activation_172 (Activation)      (None, 100)           0           dense_172[0][0]                  
____________________________________________________________________________________________________
dropout_100 (Dropout)            (None, 100)           0           activation_172[0][0]             
____________________________________________________________________________________________________
dense_173 (Dense)                (None, 60)            6060        dropout_100[0][0]                
____________________________________________________________________________________________________
activation_173 (Activation)      (None, 60)            0           dense_173[0][0]                  
____________________________________________________________________________________________________
dropout_101 (Dropout)            (None, 60)            0           activation_173[0][0]             
____________________________________________________________________________________________________
dense_174 (Dense)                (None, 30)            1830        dropout_101[0][0]                
____________________________________________________________________________________________________
activation_174 (Activation)      (None, 30)            0           dense_174[0][0]                  
____________________________________________________________________________________________________
dropout_102 (Dropout)            (None, 30)            0           activation_174[0][0]             
____________________________________________________________________________________________________
dense_175 (Dense)                (None, 10)            310         dropout_102[0][0]                
____________________________________________________________________________________________________
activation_175 (Activation)      (None, 10)            0           dense_175[0][0]                  
====================================================================================================
Total params: 185,300
Trainable params: 185,300
Non-trainable params: 0
____________________________________________________________________________________________________

SVG(model_to_dot(model).create(prog='dot', format='svg'))

tensorboard4 = TensorBoard(log_dir='/home/tensorflow/log/five_layer_relu_dropout/epoch')
my_tensorboard4 = BatchTensorBoard(log_dir='/home/tensorflow/log/five_layer_relu_dropout/batch')

model.fit(x_train_1, y_train_1,
          nb_epoch=30,
          verbose=0,
          batch_size=100,
          callbacks=[tensorboard4, my_tensorboard4])

<keras.callbacks.History at 0x27819610>

#模型的測試誤差指標
print(model.metrics_names)
# 對測試數據進行測試
model.evaluate(x_test_1, y_test_1,
          verbose=1,
          batch_size=100)

['loss', 'acc']
 9900/10000 [============================>.] - ETA: 0s
[0.025450729207368569, 0.99462999999523161]

後續系列可以持續關注： https://github.com/shikanon/MyPresentations

基於keras的深度學習基本概念講解——深度學習之從小白到入門頂原薦

一、基礎篇

1. softmax

損失函數

梯度下降

3. 激活函數

4. sigmoid

5. ReLu

6. 學習速率

7.Dropout

基於go手動寫個轉發代理服務的代碼實現

作文自動批閱程序簡介頂原

mysql配置頂原

用Docker部署一個自己的可視化爬蟲系統頂原薦

tmux快捷鍵頂原

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

基於keras的深度學習基本概念講解——深度學習之從小白到入門 頂 原 薦

一、基礎篇

1. softmax

損失函數

梯度下降

3. 激活函數

4. sigmoid

5. ReLu

6. 學習速率

7.Dropout

基於keras的深度學習基本概念講解——深度學習之從小白到入門頂原薦