關於神經網絡訓練總結和思考

原創

2020-06-10 16:45

之前一直在無腦訓練神經網絡，定義好參數，網絡，啓動Session，feed給Graph，然後就是無休止的等待，有時候運氣好，數據本身有很好的區分度，模型自然很快收斂，loss直線下降。但是當數據中混淆數據佔比較大時，模型難免不會受到很大的干擾，簡單來說就是，在當前batch_size下，你告訴模型向下走，下一個batch_size你告訴模型向上走。所以模型也矇蔽，你丫的是要向上還是要向下，loss自然會處於波動狀態，但是隨着模型多跑幾個epoch，模型就會聽從多數者的建議，就是誰告訴我向下次數多我就聽誰的，模型嚴格遵守了少數服從多數的原則。(這裏向上，向下作爲一個抽象意義上的表述，對於二分類問題就是判定該樣本是正樣本還是負樣本)，那麼這樣就會存在一個嚴重的問題，面對現實數據的不平衡，模型完完全全就會傾向樣本多的那個類別。當遇到數據混淆時，自然少數服從多數了。儘管我們有多種方法來處理樣本不平衡的問題，但是如果數據本身隨機性很大，或者數據本身存在極大的混淆，那麼即使強如神經網絡也不能得到很好的分類。

以下設計一個簡單實驗來說明這個問題。

定義網絡

import numpy as np
import tensorflow as tf

x = tf.placeholder(shape=(None, 5), dtype=tf.float32)
y = tf.placeholder(shape=(None), dtype=tf.float32)

hidden_layer_1 = tf.layers.dense(units=3, inputs=x, activation=tf.tanh)
hidden_layer_2 = tf.layers.dense(units=3, inputs=hidden_layer_1, activation=tf.tanh)
output = tf.squeeze(tf.layers.dense(units=1, inputs=hidden_layer_2, activation=tf.sigmoid))

loss = tf.reduce_mean(tf.pow(y  - output, 2))
train_op = tf.train.AdamOptimizer(0.01).minimize(loss)

構造數據

def get_batch(batch_size, mode=1):
    if mode == 1:
        x = np.array([
            [1, 0, 0, 0, 0], 
            [1, 0, 0, 0, 0], 
            [0, 1, 0, 0, 1], 
            [1, 0, 0, 0, 0], 
            [0, 1, 0, 0, 0],
            [0, 1, 0, 0, 1]]
        )
    elif mode == 2:
        x = np.random.random(30).reshape(-1, 5)
    else:
        x = np.array([
            [1, 0, 0, 0, 1], 
            [1, 0, 0, 0, 0], 
            [0, 1, 0, 0, 0], 
            [1, 0, 0, 0, 0], 
            [0, 1, 0, 0, 1],
            [0, 1, 0, 0, 1]]
        )
    y = np.array([0, 1, 1, 1, 1, 0])
    index = np.random.randint(0, len(x), batch_size)
    return np.array([x[i] for i in index]), np.array([y[i] for i in index])

# 獲取數據 mode生成混淆數據，mode生成隨機數據
get_batch(2, mode=1) 

# (array([[0, 1, 0, 0, 0],
#        [0, 1, 0, 0, 0]]), array([1, 1]))

訓練模型

import matplotlib.pyplot as plt
for i in [1, 2, None]:
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        steps = []
        loss_s = []
        for step in range(10000):
            inputs_x, inputs_y = get_batch(32, mode=i)
            _, loss_ = sess.run([train_op, loss], feed_dict={x: inputs_x, y: inputs_y})
            # print("Step is {}, loss is {}".format(step, loss_))
            if step % 100 == 0:
                steps.append(step)
                loss_s.append(loss_)
        plt.plot(steps, loss_s)
plt.show()

如圖所示，藍色和橙色的線是在mode=1，mode=2情況下的，綠色的線是mode=None下的，mode=1代表數據存在混淆，mode=2代表數據屬於隨機，mode=None是減少數據混淆後的數據，明顯可以得到前文所敘述的結論。

有時候面對很好的數據，也會出現loss不下降的情況，這種情況屬於，learning_rate和batch_size大小的設置問題，適當調整總歸它可以收斂，(如果損失函數沒有定義錯的話)。這裏再糾正一個之前錯誤的觀念，之前認爲神經網絡數據越多越好，這裏犯了想當然的錯誤，並不是數據越多越好，這個數據多是有條件的，應該指的是同類別下不同情況下的數據，這樣保證神經網絡能學習到更多現實情況下的特徵。如果全是混淆數據，再多模型也無法收斂，因爲數據越多隨着而來的噪音就越多。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

關於神經網絡訓練總結和思考

lightdb hash index的性能和限制

Leetcode practice (2)

Algorithm string ----- reverse

Leetcode practice (5)

關於神經網絡訓練總結和思考

Algorithm tree ----- DFS、BFS

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結