如何用遺傳算法優化LSTM網絡?

整體思路

  • 首先,寫一個deep_learning.py文件進行神經網絡的訓練及測試過程。
  • 將deep_learning.py中需要優化的參數(在這裏我們優化LSTM層數和全連接層數及每層的神經元個數)統一寫到一個列表num中。
  • 然後,進行遺傳算法的編寫GA.py,用需要傳入deep_learning.py文件的列表num當染色體,需要優化的參數當染色體上的基因。

deep_learning.py文件

由於要將所有需要優化的參數寫到一個列表中,所以再此文件中需要定義兩個函數,分別是創建LSTM層函數 create_lstm(inputs, units, return_sequences) 和創建全連接層(包括BN層和dropout層)函數 create_dense(inputs, units) 。

函數:create_lstm(inputs, units, return_sequences)

輸入:

  • inputs:傳進此LSTM層的輸入,如果這一LSTM層是第一層LSTM層,則傳入的是 layers.Input() 的變量名;否則,傳入的應該是上一個LSTM層。
  • units:此LSTM層中有多少個神經元。
  • return_sequences:此LSTM層保留所有輸出(True)還是隻保留最後一步的輸出(False)。

輸出:

  • 輸出LSTM層。
# 定義LSTM層函數
def create_lstm(inputs, units, return_sequences):
    lstm = layers.Bidirectional(layers.LSTM(units, return_sequences=return_sequences))(inputs)
    print('Lstm', lstm.shape)
    return lstm

函數:create_dense(inputs, units)

輸入:

  • inputs:傳進此全連接層的輸入,如果這一全連接層是第一層全連接層,則傳入的是 layers.Flatten() 的變量名;否則,傳入的應該是上一個全連接層。
  • units:此全連接層中有多少個神經元。

輸出:

  • 輸出全連接層、BN層和dropout層。
# 定義Dense層函數
def create_dense(inputs, units):
    dense = layers.Dense(units, kernel_regularizer=keras.regularizers.l2(0.001), activation='relu')(inputs)
    print('Dense', dense.shape)
    dense_dropout = layers.Dropout(0.2)(dense)
    dense_batch = layers.BatchNormalization()(dense_dropout)
    return dense, dense_dropout, dense_batch

設置參數

設置LSTM層參數的時候,只有最後一層只保留最後一步的輸出,其他的都是全部保留。

# 設置LSTM層參數
lstm_num_layers = 2
lstm_units = [128, 128]
lstm_name = list(np.zeros((lstm_num_layers,)))
# 設置LSTM_Dense層參數
lstm_dense_num_layers = 2
lstm_dense_units = [128, 64]
lstm_dense_name = list(np.zeros((lstm_dense_num_layers,)))
lstm_dense_dropout_name = list(np.zeros((lstm_dense_num_layers,)))
lstm_dense_batch_name = list(np.zeros((lstm_dense_num_layers,)))

調用函數構建模型

按照介紹函數時的解釋構建網絡模型。

inputs_lstm = layers.Input(shape=(x_train.shape[1], x_train.shape[2]))
print(inputs_lstm.shape)
for i in range(lstm_num_layers):
    if i == 0:
        inputs = inputs_lstm
    else:
        inputs = lstm_name[i-1]
    if i == lstm_num_layers - 1:
        return_sequences=False
    else:
        return_sequences=True
    lstm_name[i] = create_lstm(inputs, lstm_units[i], return_sequences)
for i in range(lstm_dense_num_layers):
    if i == 0:
        inputs = lstm_name[lstm_num_layers-1]
    else:
        inputs = lstm_dense_batch_name[i-1]
    lstm_dense_name[i], lstm_dense_dropout_name[i], lstm_dense_batch_name[i] = create_dense(inputs, lstm_dense_units[i])
outputs_lstm = layers.Dense(10, activation='softmax')(lstm_dense_batch_name[lstm_dense_num_layers-1])
print('Outputs:', outputs_lstm.shape)

完整代碼

以上沒有用到列表num,而是直接將層數設爲2,神經元數量也直接給出,目的是爲了方便講解,下面給出完整代碼:

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import models, layers, optimizers
import matplotlib.pyplot as plt

# 定義LSTM層函數
def create_lstm(inputs, units, return_sequences):
    lstm = layers.Bidirectional(layers.LSTM(units, return_sequences=return_sequences))(inputs)
    return lstm

# 定義Dense層函數
def create_dense(inputs, units):
    dense = layers.Dense(units, kernel_regularizer=keras.regularizers.l2(0.001), activation='relu')(inputs)
    dense_dropout = layers.Dropout(0.2)(dense)
    dense_batch = layers.BatchNormalization()(dense_dropout)
    return dense, dense_dropout, dense_batch

def load():
    # Mnist數據集加載
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

    # Mnist數據集簡單歸一化
    x_train, x_test = x_train / 255.0, x_test / 255.0
    return x_train, y_train, x_test, y_test

def classify(x_train, y_train, x_test, y_test, num):
    # 設置LSTM層參數
    lstm_num_layers = num[0]
    lstm_units = num[2: 2+lstm_num_layers]
    lstm_name = list(np.zeros((lstm_num_layers,)))
    
    # 設置LSTM_Dense層參數
    lstm_dense_num_layers = num[1]
    lstm_dense_units = num[2+lstm_num_layers: 2+lstm_num_layers+lstm_dense_num_layers]
    lstm_dense_name = list(np.zeros((lstm_dense_num_layers,)))
    lstm_dense_dropout_name = list(np.zeros((lstm_dense_num_layers,)))
    lstm_dense_batch_name = list(np.zeros((lstm_dense_num_layers,)))

    inputs_lstm = layers.Input(shape=(x_train.shape[1], x_train.shape[2]))
    for i in range(lstm_num_layers):
        if i == 0:
            inputs = inputs_lstm
        else:
            inputs = lstm_name[i-1]
        if i == lstm_num_layers - 1:
            return_sequences=False
        else:
            return_sequences=True
        lstm_name[i] = create_lstm(inputs, lstm_units[i], return_sequences)
    for i in range(lstm_dense_num_layers):
        if i == 0:
            inputs = lstm_name[lstm_num_layers-1]
        else:
            inputs = lstm_dense_batch_name[i-1]
        lstm_dense_name[i], lstm_dense_dropout_name[i], lstm_dense_batch_name[i] = create_dense(inputs, lstm_dense_units[i])
    outputs_lstm = layers.Dense(10, activation='softmax')(lstm_dense_batch_name[lstm_dense_num_layers-1])

    LSTM_model = keras.Model(inputs_lstm, outputs_lstm)
    LSTM_model.compile(optimizer=keras.optimizers.Adam(),
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])

    history = LSTM_model.fit(x_train, y_train, batch_size=32, epochs=5, validation_split=0.1, verbose=0)

    # 驗證模型:
    lstm.evaluate(x_test, y_test, verbose=0)
    return results[1]

列表num中的前兩個元素分別表示LSTM層的層數和全連接層的層數,後面的元素表示每層的神經元個數。
返回的值爲測試集的準確率。

GA.py

常規的遺傳算法介紹可以參考我的另一篇文章遺傳算法求解最大值問題詳解(附python代碼)

問題

在優化卷積神經網絡這個問題上,用常規的遺傳算法不易實現,原因如下:

  • 1、傳統的遺傳算法中的每條染色體的長度相同,但是優化LSTM網絡時,染色體的長度會因爲層數的不同而不同。比如a染色體有一層LSTM層和一層全連接層,則在這條染色體上共有四個基因(兩個代表層數,兩個代表每層的神經元個數);b染色體有兩層LSTM層和兩層全連接層,則在這條染色體上共有六個基因(兩個代表層數,四個代表每層的神經元個數)。
  • 2、在傳統的遺傳算法中,染色體上的基因的取值範圍都是相同的,但優化LSTM網絡時,需要讓表示層數的基因在一個範圍內,表示神經元個數的基因在另一個範圍內。比如,LSTM層層數在一層到三層之間,全連接層個數在一層到三層之間,神經元個數在32個到256個之間。
  • 3、由於第一個問題(即染色體長度不同)的存在,交叉函數、變異函數均需要做出修改。

解決方法

  • 1、將每條染色體設置爲相同的長度(因爲LSTM層層數最多三層,全連接層個數最多三層,加上最前面兩個表示層數的基因,故設置每條染色體上有3+3+2=8個基因),達不到長度要求的後面補零。
  • 2、先設置前面兩個基因,令其範圍分別在一到三之間和一到三之間,然後根據這連個基因確定後面關於每層神經元個數的基因的個數。
  • 3、對於交叉函數的修改,首先確定取出的兩條染色體(設爲a染色體和b染色體)上需要交換的位置,然後遍歷兩條染色體在這些位置上的基因,如果任一染色體上此位置上的基因爲0或要交換的基因是關於層數的,則取消此位置的交換。
  • 4、對於變異函數的修改,只有關於神經元個數的基因變異,關於層數的基因不變異。

完整代碼

import numpy as np
import deep_learning as project

DNA_SIZE = 2
DNA_SIZE_MAX = 8
POP_SIZE = 20
CROSS_RATE = 0.5
MUTATION_RATE = 0.01
N_GENERATIONS = 40

train_x, train_y, test_x, test_y = project.load()

def get_fitness(x): 
    return project.classify(train_x, train_y, test_x, test_y, num=x)

def select(pop, fitness):
    idx = np.random.choice(np.arange(POP_SIZE), size=POP_SIZE, replace=True, p=fitness / fitness.sum())
    return pop[idx]

def crossover(parent, pop):
    if np.random.rand() < CROSS_RATE:
        i_ = np.random.randint(0, POP_SIZE, size=1)
        cross_points = np.random.randint(0, 2, size=DNA_SIZE_MAX).astype(np.bool)
        for i, point in enumerate(cross_points):
            if point == True and pop[i_, i]*parent[i] == 0:
                cross_points[i] = False
            if point == True and i < 2:
                cross_points[i] = False
        parent[cross_points] = pop[i_, cross_points]
    return parent

def mutate(child):
    for point in range(DNA_SIZE_MAX):
        if np.random.rand() < MUTATION_RATE:
            if point >= 3:
                if child[point] != 0:
                    child[point] = np.random.randint(32, 512)
    return child

pop_layers = np.zeros((POP_SIZE, DNA_SIZE), np.int32)
pop_layers[:, 0] = np.random.randint(1, 4, size=(POP_SIZE,))
pop_layers[:, 1] = np.random.randint(1, 4, size=(POP_SIZE,))
pop = np.zeros((POP_SIZE, DNA_SIZE_MAX))
for i in range(POP_SIZE):
    pop_neurons = np.random.randint(32, 257, size=(pop_layers[i].sum(),))
    pop_stack = np.hstack((pop_layers[i], pop_neurons))
    for j, gene in enumerate(pop_stack):
        pop[i][j] = gene

for each_generation in range(N_GENERATIONS):
    fitness = np.zeros([POP_SIZE, ])
    for i in range(POP_SIZE):
        pop_list = list(pop[i])
        for j, each in enumerate(pop_list):
            if each == 0.0:
                index = j
                pop_list = pop_list[:j]
        for k, each in enumerate(pop_list):
            each_int = int(each)
            pop_list[k] = each_int
        fitness[i] = get_fitness(pop_list)
        print('第%d代第%d個染色體的適應度爲%f' % (each_generation+1, i+1, fitness[i]))
        print('此染色體爲:', pop_list)
    print("Generation:", each_generation+1, "Most fitted DNA: ", pop[np.argmax(fitness), :], "適應度爲:", fitness[np.argmax(fitness)])
    pop = select(pop, fitness)
    pop_copy = pop.copy()
    for parent in pop:
        child = crossover(parent, pop_copy)
        child = mutate(child)
        parent = child

其中,如下代碼的作用是將數組中的0元素刪除掉,具體實現過程可以參考我的另一篇文章刪掉nd array數組中的所有零元素

for each_generation in range(N_GENERATIONS):
    fitness = np.zeros([POP_SIZE, ])
    for i in range(POP_SIZE):
        pop_list = list(pop[i])
        for j, each in enumerate(pop_list):
            if each == 0.0:
                index = j
                pop_list = pop_list[:j]
        for k, each in enumerate(pop_list):
            each_int = int(each)
            pop_list[k] = each_int
發佈了75 篇原創文章 · 獲贊 5 · 訪問量 2499
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章