第三章一開始就傻眼了~~!什麼是正則化LP啊~然後果斷回頭看第二章看了幾遍還是難理解,然後參考了這個博客,終於有點感覺了,https://vimsky.com/article/3852.html,其實所謂訓練出來的模型,就是W和B的多種可能的集合,B是加減級別的,而W是乘除級別的,所以W的調參是更重要的。這整個第三章就是希望能在訓練模型時,調出最好的參來,減少過多無效的和太有效W,從而增加模型的準確同時又要防止過擬合~~~而這調整的工具就是自編碼器!
這裏3.3節先要完成兩個自編碼器,一個是稀疏自編碼器,由sigmoid函數進行,一個是普通自編碼器,由relu函數進行
import tensorflow as tf
import tensorlayer as tl
import numpy as np
learning_rate = 0.0001
lambda_l2_w = 0.01
n_epochs = 200
batch_size =128
print_interval = 200
hidden_size = 196
input_size = 784
image_width = 28
model = 'sigmoid'
# model = 'relu'
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
print('~~~~~~~~~~~~~~~~build network~~~~~~~~~~~~~~~~~')
if model == 'relu':
network = tl.layers.InputLayer(x, name='input')
network = tl.layers.DenseLayer(network, hidden_size, tf.nn.relu, name='relu1')
encoded_img = network.outputs
recon_layer1 = tl.layers.DenseLayer(network, input_size, tf.nn.softplus, name='recon_layer1')
elif model == 'sigmoid':
network = tl.layers.InputLayer(x, name='input')
network = tl.layers.DenseLayer(network, hidden_size, tf.nn.sigmoid, name='sigmoid1')
encoded_img = network.outputs
recon_layer1 = tl.layers.DenseLayer(network, input_size, tf.nn.sigmoid, name='recon_layer1')
y = recon_layer1.outputs
train_params = recon_layer1.all_params[-4:]
mse = tf.reduce_sum(tf.squared_difference(y, x), 1)
mse = tf.reduce_mean(mse)
L2_w = tf.contrib.layers.l2_regularizer(lambda_l2_w)(train_params[0] + tf.contrib.layers.l2_regularizer(lambda_l2_w)(train_params[2]))
activation_out = recon_layer1.all_layers[-2]
L1_a = 0.001 * tf.reduce_mean(activation_out)
beta = 5
rho = 0.15
p_hat = tf.reduce_mean(activation_out, 0)
KLD = beta * tf.reduce_sum(rho * tf.log(tf.divide(rho, p_hat)) + (1-rho) * tf.log((1-rho)/(tf.subtract(float(1), p_hat))))
if model == 'sigmoid':
cost = mse + L2_w + KLD
if model == 'relu':
cost = mse +L2_w +L1_a
train_op = tf.train.AdamOptimizer(learning_rate).minimize(cost)
saver = tf.train.Saver()
print('~~~~~~~~~~~模型訓練~~~~~~~~~~~~~~~~~~')
X_train, y_train, X_val, y_val, X_test, y_test = tl.files.load_mnist_dataset(shape=(-1, 784))
total_batch = X_train.shape[0] // batch_size
with tf.Session() as sess:
tl.layers.initialize_global_variables(sess)
for epoch in range(n_epochs):
avg_cost = 0.
for i in range(total_batch):
batch_x, batch_y = X_train[i*batch_size:(i+1)*batch_size], y_train[i*batch_size:(i+1)*batch_size]
batch_x = np.array(batch_x).astype(np.float32)
batch_cost,_ = sess.run([cost,train_op], feed_dict={x:batch_x})
avg_cost += batch_cost
if not i % print_interval:
print('Minibatch: %03d | Cost: %.3f' %(i+1, batch_cost))
print('Epoch: %03d | AvgCost: %.3f' %(epoch+1, avg_cost/(i+1)) )
saver.save(sess, save_path='./ae_tl/autoencoder-sigmoid.ckpt')
# saver.save(sess, save_path='./ae_tl/autoencoder-relu.ckpt')
print('~~~~~~~~~~~~~~~image_show~~~~~~~~~~~~~~~~')
import matplotlib.pyplot as plt
n_images = 15
fig, axes = plt.subplots(nrows=2, ncols=n_images, sharex=True, sharey=True, figsize=(20, 2.5))
test_images = X_test[:n_images]
with tf.Session() as sess:
saver.restore(sess, save_path='./ae_tl/autoencoder-sigmoid.ckpt')
# saver.restore(sess, save_path='./ae_tl/autoencoder-relu.ckpt')
decoded = sess.run(recon_layer1.outputs, feed_dict={x:test_images})
if model =='relu':
weights = sess.run(tl.layers.get_variables_with_name('relu/W:0', train_only=False, printable=True))
elif model == 'sigmoid':
weights = sess.run(tl.layers.get_variables_with_name('sigmoid/W:0', train_only=False, printable=True))
recon_weights = sess.run(tl.layers.get_variables_with_name('recon_layer1/W:0', train_only=False, printable=True))
recon_bias = sess.run(tl.layers.get_variables_with_name('recon_layer1/b:0', train_only=False, printable=True))
for i in range(n_images):
for ax, img in zip(axes, [test_images, decoded]):
ax[i].imshow(img[i].reshape((image_width, image_width)), cmap='binary')
plt.show()
代碼運行挺順的,先運行了sigmoid,輸出如下:
~~~~~~~~~~~~~~~~build network~~~~~~~~~~~~~~~~~
[TL] InputLayer input: (?, 784)
[TL] DenseLayer sigmoid1: 196 sigmoid
[TL] DenseLayer recon_layer1: 784 sigmoid
~~~~~~~~~~~模型訓練~~~~~~~~~~~~~~~~~~
[TL] Load or Download MNIST > data\mnist
[TL] data\mnist\train-images-idx3-ubyte.gz
[TL] data\mnist\t10k-images-idx3-ubyte.gz
Minibatch: 001 | Cost: 27513.186
Minibatch: 201 | Cost: 13620.946
Epoch: 001 | AvgCost: 15094.485
Minibatch: 001 | Cost: 7823.636
Minibatch: 201 | Cost: 4729.617
Epoch: 002 | AvgCost: 5013.418
Minibatch: 001 | Cost: 3103.289
Minibatch: 201 | Cost: 2092.391
Epoch: 003 | AvgCost: 2175.729
Minibatch: 001 | Cost: 1492.272
Minibatch: 201 | Cost: 1085.233
Epoch: 004 | AvgCost: 1116.217
Minibatch: 001 | Cost: 823.772
Minibatch: 201 | Cost: 636.336
Epoch: 005 | AvgCost: 649.845
Minibatch: 001 | Cost: 508.730
Minibatch: 201 | Cost: 413.956
Epoch: 006 | AvgCost: 420.595
Minibatch: 001 | Cost: 346.198
Minibatch: 201 | Cost: 294.844
Epoch: 007 | AvgCost: 298.442
Minibatch: 001 | Cost: 256.320
Minibatch: 201 | Cost: 226.944
Epoch: 008 | AvgCost: 229.060
Minibatch: 001 | Cost: 203.642
Minibatch: 201 | Cost: 186.067
Epoch: 009 | AvgCost: 187.394
Minibatch: 001 | Cost: 171.062
Minibatch: 201 | Cost: 160.132
Epoch: 010 | AvgCost: 161.000
~~~~
~~~~
Minibatch: 001 | Cost: 46.998
Minibatch: 201 | Cost: 50.521
Epoch: 191 | AvgCost: 48.724
Minibatch: 001 | Cost: 46.998
Minibatch: 201 | Cost: 50.520
Epoch: 192 | AvgCost: 48.723
Minibatch: 001 | Cost: 46.998
Minibatch: 201 | Cost: 50.520
Epoch: 193 | AvgCost: 48.723
Minibatch: 001 | Cost: 46.998
Minibatch: 201 | Cost: 50.520
Epoch: 194 | AvgCost: 48.723
Minibatch: 001 | Cost: 46.998
Minibatch: 201 | Cost: 50.519
Epoch: 195 | AvgCost: 48.722
Minibatch: 001 | Cost: 46.999
Minibatch: 201 | Cost: 50.519
Epoch: 196 | AvgCost: 48.722
Minibatch: 001 | Cost: 46.999
Minibatch: 201 | Cost: 50.519
Epoch: 197 | AvgCost: 48.721
Minibatch: 001 | Cost: 46.999
Minibatch: 201 | Cost: 50.518
Epoch: 198 | AvgCost: 48.721
Minibatch: 001 | Cost: 46.999
Minibatch: 201 | Cost: 50.518
Epoch: 199 | AvgCost: 48.721
Minibatch: 001 | Cost: 46.999
Minibatch: 201 | Cost: 50.518
Epoch: 200 | AvgCost: 48.720
~~~~~~~~~~~~~~~image_show~~~~~~~~~~~~~~~~
INFO:tensorflow:Restoring parameters from ./ae_tl/autoencoder-sigmoid.ckpt
[TL] Restoring parameters from ./ae_tl/autoencoder-sigmoid.ckpt
[TL] [*] geting variables with sigmoid/W:0
[TL] [*] geting variables with recon_layer1/W:0
[TL] got 0: recon_layer1/W:0 (196, 784)
[TL] [*] geting variables with recon_layer1/b:0
[TL] got 0: recon_layer1/b:0 (784,)
QWindowsWindow::setGeometry: Unable to set geometry 2000x317+4+23 on QWidgetWindow/'MainWindowClassWindow'. Resulting geometry: 1284x317+4+23 (frame: 4, 23, 4, 4, custom margin: 0, 0, 0, 0, minimum size: 69x67, maximum size: 16777215x16777215).
[Finished in 1752.3s]
sigmoid是加KLD的,relu是沒加的。這就是它們的區別,上面是稀疏自編碼器的效果,下面是普通自編碼器的效果,兩個對比還是普通自編碼器更清晰,效果更好些,教材的解釋是損失函數有一部份是爲了稀疏性而不是重構。從每個epoch 的輸出來看,也明顯relu的cost縮小得更多。
~~~~~~~~~~~~~~~~build network~~~~~~~~~~~~~~~~~
[TL] InputLayer input: (?, 784)
[TL] DenseLayer relu1: 196 relu
[TL] DenseLayer recon_layer1: 784 softplus
~~~~~~~~~~~模型訓練~~~~~~~~~~~~~~~~~~
[TL] Load or Download MNIST > data\mnist
[TL] data\mnist\train-images-idx3-ubyte.gz
[TL] data\mnist\t10k-images-idx3-ubyte.gz
Minibatch: 001 | Cost: 27474.928
Minibatch: 201 | Cost: 13717.572
Epoch: 001 | AvgCost: 15163.097
Minibatch: 001 | Cost: 7918.465
Minibatch: 201 | Cost: 4756.286
Epoch: 002 | AvgCost: 5038.762
Minibatch: 001 | Cost: 3050.080
Minibatch: 201 | Cost: 2005.636
Epoch: 003 | AvgCost: 2093.527
Minibatch: 001 | Cost: 1400.136
Minibatch: 201 | Cost: 986.144
Epoch: 004 | AvgCost: 1018.284
Minibatch: 001 | Cost: 726.735
Minibatch: 201 | Cost: 538.229
Epoch: 005 | AvgCost: 551.934
Minibatch: 001 | Cost: 413.829
Minibatch: 201 | Cost: 320.170
Epoch: 006 | AvgCost: 326.546
Minibatch: 001 | Cost: 255.625
Minibatch: 201 | Cost: 206.370
Epoch: 007 | AvgCost: 209.430
Minibatch: 001 | Cost: 170.754
Minibatch: 201 | Cost: 143.954
Epoch: 008 | AvgCost: 145.377
Minibatch: 001 | Cost: 123.279
Minibatch: 201 | Cost: 108.405
Epoch: 009 | AvgCost: 108.961
Minibatch: 001 | Cost: 95.644
Minibatch: 201 | Cost: 87.252
Epoch: 010 | AvgCost: 87.363
~~~~~
~~~~~
Minibatch: 001 | Cost: 23.798
Minibatch: 201 | Cost: 24.983
Epoch: 191 | AvgCost: 25.143
Minibatch: 001 | Cost: 23.795
Minibatch: 201 | Cost: 24.979
Epoch: 192 | AvgCost: 25.140
Minibatch: 001 | Cost: 23.793
Minibatch: 201 | Cost: 24.976
Epoch: 193 | AvgCost: 25.137
Minibatch: 001 | Cost: 23.790
Minibatch: 201 | Cost: 24.973
Epoch: 194 | AvgCost: 25.134
Minibatch: 001 | Cost: 23.788
Minibatch: 201 | Cost: 24.970
Epoch: 195 | AvgCost: 25.132
Minibatch: 001 | Cost: 23.785
Minibatch: 201 | Cost: 24.967
Epoch: 196 | AvgCost: 25.129
Minibatch: 001 | Cost: 23.783
Minibatch: 201 | Cost: 24.964
Epoch: 197 | AvgCost: 25.126
Minibatch: 001 | Cost: 23.780
Minibatch: 201 | Cost: 24.961
Epoch: 198 | AvgCost: 25.123
Minibatch: 001 | Cost: 23.778
Minibatch: 201 | Cost: 24.958
Epoch: 199 | AvgCost: 25.121
Minibatch: 001 | Cost: 23.775
Minibatch: 201 | Cost: 24.955
Epoch: 200 | AvgCost: 25.118
~~~~~~~~~~~~~~~image_show~~~~~~~~~~~~~~~~
INFO:tensorflow:Restoring parameters from ./ae_tl/autoencoder-relu.ckpt
[TL] Restoring parameters from ./ae_tl/autoencoder-relu.ckpt
[TL] [*] geting variables with relu/W:0
[TL] [*] geting variables with recon_layer1/W:0
[TL] got 0: recon_layer1/W:0 (196, 784)
[TL] [*] geting variables with recon_layer1/b:0
[TL] got 0: recon_layer1/b:0 (784,)
QWindowsWindow::setGeometry: Unable to set geometry 2000x317+4+23 on QWidgetWindow/'MainWindowClassWindow'. Resulting geometry: 1284x317+4+23 (frame: 4, 23, 4, 4, custom margin: 0, 0, 0, 0, minimum size: 69x67, maximum size: 16777215x16777215).
[Finished in 1801.1s]