自編碼器python實現

自編碼器

自編碼器是一種非常通用的神經網絡工具。主要思想是通過一個編碼器,將原始信息編碼爲一組向量,然後通過一個解碼器,將向量解碼爲原始數據。通過衡量輸入與輸出的差別,來對網絡參數進行訓練。主要可以用來進行信息壓縮、降噪、添加噪聲等工作。

最進在瞭解GAN方向的應用,發現很多GANs類似與自編碼器的思想,在條件GAN中,生成器類似於自編碼器中的解碼器。都是通過給定一組輸入,來得到相應的圖片。我比較好奇自編碼器產生的編碼信息,想知道在手動修改編碼信息後,解碼出來的圖像會是什麼樣子。

於是以利用卷積神經網絡構建了一個簡單的自編碼器。

import tensorflow as tf
import keras
import numpy as np
from keras.datasets import mnist
from keras.preprocessing import sequence
from keras.models import Sequential, Model
from keras.layers import Dense, Embedding, Reshape
from keras.layers import GRU, Input, Lambda
from keras.layers import Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, UpSampling2D
from keras.callbacks import TensorBoard, CSVLogger, EarlyStopping
from keras import backend as K
from PIL import Image


batch_size = 128
num_classes = 10
epochs = 12

img_rows, img_cols = 28, 28

(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

input_img = Input(shape=(28, 28, 1))

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(inputs=input_img, outputs=decoded)
autoencoder.compile(optimizer='adagrad', loss='binary_crossentropy')

autoencoder.fit(x_train, x_train, epochs=50, batch_size=256,
                shuffle=True, validation_data=(x_test, x_test),
                callbacks=[TensorBoard(log_dir='autoencoder')])

decoded_imgs = autoencoder.predict(x_test)

在50個epoch後,網絡的loss在0.11左右,已經能很好的編碼mnist圖像了。

Epoch 50/50
60000/60000 [==============================] - 5s 76us/step - loss: 0.1123 - val_loss: 0.1112

可以看出原始圖像和編碼-解碼後的圖像差別不大,只是編碼-解碼的圖像會稍微有點糊:
在這裏插入圖片描述
通過手動獲取內部的輸入輸出,我們能得到編碼後的向量:
在這裏插入圖片描述

通過修改編碼向量的值,再經過解碼器解碼,可以得到不同的圖片結果。
在這裏插入圖片描述
在這裏插入圖片描述
有時候會讓圖片更清晰(銳化),但一般修改後的編碼向量都會時圖像變的無意義。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章