CHAPTER 9 -Up and Running with TensorFlow part2

本篇文章是個人翻譯的,如有商業用途,請通知本人謝謝.

存儲和回覆模型

一旦你訓練了你的模型，你應該把它的參數保存到磁盤，所以你可以隨時隨地回到它，在另一個程序中使用它，與其他模型比較，等等。此外，您可能希望在培訓期間定期保存檢查點，以便如果您的計算機在訓練過程中崩潰，您可以從上次檢查點繼續進行，而不是從頭開始。

TensorFlow可以輕鬆保存和恢復模型。只需在構造階段結束（創建所有變量節點之後）創建一個Save節點; 那麼在執行階段，只要你想保存模型，只要調用它的save（）方法:

[...]
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
[...]
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0: # checkpoint every 100 epochs
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
            
        sess.run(training_op)
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

恢復模型同樣容易：在構建階段結束時創建一個Saver，就像之前一樣，但是在執行階段的開始，而不是使用init節點初始化變量，你可以調用restore（）方法的Saver對象：

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    [...]

默認情況下，Saver將以自己的名稱保存並還原所有變量，但如果需要更多控制，則可以指定要保存或還原的變量以及要使用的名稱。例如，以下Saver將僅保存或恢復名稱權重下的theta變量：

saver = tf.train.Saver({"weights": theta})

完整代碼

 numpy as np
from sklearn.datasets import fetch_california_housing
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m, n = housing.data.shape
print("數據集:{}行,{}列".format(m,n))
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

n_epochs = 1000  # not shown in the book
learning_rate = 0.01  # not shown

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")  # not shown
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")  # not shown
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")  # not shown
error = y_pred - y  # not shown
mse = tf.reduce_mean(tf.square(error), name="mse")  # not shown
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)  # not shown
training_op = optimizer.minimize(mse)  # not shown

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())  # not shown
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)

    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt") #找到tmp文件夾就找到文件了

使用TensorBoard展現圖形和訓練曲線
所以現在我們有一個使用Mini_batch 梯度下降訓練線性迴歸模型的計算圖譜，我們正在定期保存檢查點。聽起來很複雜，不是嗎？然而，我們仍然依靠print（）函數可視化訓練過程中的進度。有一個更好的方法：進入TensorBoard。如果您提供一些訓練統計信息，它將在您的網絡瀏覽器中顯示這些統計信息的良好交互式可視化（例如學習曲線）。您還可以提供圖形的定義，它將爲您提供一個很好的界面來瀏覽它。這對於識別圖中的錯誤，找到瓶頸等是非常有用的。

第一步是調整程序，以便將圖形定義和一些訓練統計信息（例如，training_error（MSE））寫入TensorBoard將讀取的日誌目錄。您每次運行程序時都需要使用不同的日誌目錄，否則TensorBoard將會合並來自不同運行的統計信息，這將會混亂可視化。最簡單的解決方案是在日誌目錄名稱中包含時間戳。在程序開頭添加以下代碼：

from datetime import datetime
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

接下來，在建設階段結束時添加以下代碼：

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

第一行創建一個將評估MSE值並將其寫入TensorBoard兼容的二進制日誌字符串（稱爲摘要）中的節點。第二行創建一個FileWriter，您將用於將日誌文件的摘要寫入日誌目錄中。第一個參數指示日誌目錄的路徑（在本例中爲tf_logs / run-20160906091959 /，相對於當前目錄）。第二個（可選）參數是您想要可視化的圖形。創建時，文件寫入器創建日誌目錄（如果需要），並將其定義在二進制日誌文件（稱爲事件）中。
接下來，您需要更新執行階段，以便在訓練期間定期評估mse_summary節點（例如，每10個小批量）。這將輸出一個摘要，然後可以使用file_writer寫入事件文件。以下是更新的代碼：

[...]
for batch_index in range(n_batches):
    X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
    if batch_index % 10 == 0:
        summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
        step = epoch * n_batches + batch_index
        file_writer.add_summary(summary_str, step)
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
[...]

避免在每一個培訓階段記錄培訓數據，因爲這會大大減慢培訓速度.

最後，要在程序結束時關閉FileWriter：

file_writer.close()

完整代碼

import numpy as np
from sklearn.datasets import fetch_california_housing
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m, n = housing.data.shape
print("數據集:{}行,{}列".format(m,n))
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = r"D://tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:                                                        # not shown in the book
    sess.run(init)                                                                # not shown

    for epoch in range(n_epochs):                                                 # not shown
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()
file_writer.close()
print(best_theta)

名稱範圍

當處理更復雜的模型（如神經網絡）時，該圖可以很容易地與數千個節點混淆。爲了避免這種情況，您可以創建名稱範圍來對相關節點進行分組。例如，我們修改以前的代碼來定義名爲“loss”的名稱範圍內的錯誤和mse操作：

with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

在範圍內定義的每個op的名稱現在以“loss /”爲前綴：

>>> print(error.op.name)
loss/sub
>>> print(mse.op.name)
loss/mse

在TensorBoard中，mse和error節點現在出現在Loss命名空間中，默認情況下會出現崩潰（圖9-5）。

完整代碼

import numpy as np
from sklearn.datasets import fetch_california_housing
import tensorflow as tf
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m, n = housing.data.shape
print("數據集:{}行,{}列".format(m,n))
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = r"D://tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")


def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch


with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")


optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

file_writer.flush()
file_writer.close()
print("Best theta:")
print(best_theta)

模塊性

假設您要創建一個添加兩個整流線性單元（ReLU）的輸出的圖形。 ReLU計算輸入的線性函數，如果爲正，則輸出結果，否則爲0，如等式9-1所示。

下面的代碼做這個工作，但是它是相當重複的：

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")
z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")
relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z1, 0., name="relu2")
output = tf.add(relu1, relu2, name="output")

這樣的重複代碼很難維護，容易出錯（實際上，這個代碼包含了一個剪貼錯誤，你發現了嗎？）如果你想添加更多的ReLU，會變得更糟。幸運的是，TensorFlow可以讓您保持DRY（不要重複自己）：只需創建一個功能來構建ReLU。以下代碼創建五個ReLU並輸出其總和（注意，add_n（）創建一個將計算張量列表的和的操作）：

def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

請注意，創建節點時，TensorFlow將檢查其名稱是否已存在，如果它已經存在，則會附加一個下劃線，後跟一個索引，以使該名稱是唯一的。因此，第一個ReLU包含名爲“權重”，“偏差”，“z”和“relu”的節點（加上其他默認名稱的更多節點，如“MatMul”）; 第二個ReLU包含名爲“weights_1”，“bias_1”等節點的節點; 第三個ReLU包含名爲 “weights_2”，“bias_2”的節點，依此類推。 TensorBoard識別這樣的系列並將它們摺疊在一起以減少混亂（如圖9-6所示）

使用名稱範圍，您可以使圖形更清晰。簡單地將relu（）函數的所有內容移動到名稱範圍內。圖9-7顯示了結果圖。請注意，TensorFlow還通過附加_1，_2等來給出名稱範圍的唯一名稱。

def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")    # not shown
        b = tf.Variable(0.0, name="bias")                             # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                      # not shown
        return tf.maximum(z, 0., name="max")                          # not shown

共享變量

如果要在圖形的各個組件之間共享一個變量，一個簡單的選項是首先創建它，然後將其作爲參數傳遞給需要它的函數。例如，假設要使用所有ReLU的共享閾值變量來控制ReLU閾值（當前硬編碼爲0）。您可以先創建該變量，然後將其傳遞給relu（）函數：

reset_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

這很好:現在您可以使用閾值變量來控制所有ReLUs的閾值。但是，如果有許多共享參數，比如這一項，那麼必須一直將它們作爲參數傳遞，這將是非常痛苦的。許多人創建了一個包含模型中所有變量的Python字典，並將其傳遞給每個函數。另一些則爲每個模塊創建一個類(例如:一個使用類變量來處理共享參數的ReLU類。另一種選擇是在第一次調用時將共享變量設置爲relu()函數的屬性，如下所列:

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, relu.threshold, name="max")

TensorFlow提供了另一個選項，這可能會導致比以前的解決方案稍微更清潔和更模塊化的代碼.5這個解決方案首先要明白一點，但是由於它在TensorFlow中使用了很多，值得深入細節。這個想法是使用get_variable（）函數來創建共享變量，如果它還不存在，或者如果已經存在，則重用它。所需的行爲（創建或重用）由當前variable_scope（）的屬性控制。例如，以下代碼將創建一個名爲“relu / threshold”的變量（作爲標量，因爲shape =（），並使用0.0作爲初始值）：

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))

請注意，如果變量已經通過較早的get_variable（）調用創建，則此代碼將引發異常。這種行爲可以防止錯誤地重用變量。如果要重用變量，則需要通過將變量scope的重用屬性設置爲True來明確說明（在這種情況下，您不必指定形狀或初始值）：

with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

該代碼將獲取現有的“relu / threshold”變量，如果不存在或引發異常（如果沒有使用get_variable（）創建）。或者，您可以通過調用scope的reuse_variables（）方法將重用屬性設置爲true：

with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

一旦重新使用設置爲True，它將不能在塊內設置爲False。而且，如果在其中定義其他變量作用域，它們將自動繼承reuse = True。最後，只有通過get_variable（）創建的變量纔可以這樣重用.

現在，您擁有所有需要的部分，使relu（）函數訪問閾值變量，而不必將其作爲參數傳遞：

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

該代碼首先定義relu（）函數，然後創建relu / threshold變量（作爲標量，稍後將被初始化爲0.0），並通過調用relu（）函數構建五個ReLU。 relu（）函數重用relu / threshold變量，並創建其他ReLU節點。

使用get_variable（）創建的變量始終以其variable_scope的名稱作爲前綴命名（例如，“relu / threshold”），但對於所有其他節點（包括使用tf.Variable（）創建的變量），變量範圍的行爲就像一個新名稱的範圍。特別是，如果已經創建了具有相同名稱的名稱範圍，則添加後綴以使該名稱是唯一的。例如，在前面的代碼中創建的所有節點（閾值變量除外）的名稱前綴爲“relu_1 /”到“relu_5 /”，如圖9-8所示。

不幸的是，必須在relu（）函數之外定義閾值變量，其中ReLU代碼的其餘部分都駐留在其中。要解決此問題，以下代碼在第一次調用時在relu（）函數中創建閾值變量，然後在後續調用中重新使用。現在，relu（）函數不必擔心名稱範圍或變量共享：它只是調用get_variable（），它將創建或重用閾值變量（它不需要知道是哪種情況）。其餘的代碼調用relu（）五次，確保在第一次調用時設置reuse = False，而對於其他調用來說，reuse = True。

def relu(X):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
    w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
    w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
    b = tf.Variable(0.0, name="bias")                           # not shown
    z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
    return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = []
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")

生成的圖形與之前略有不同，因爲共享變量存在於第一個ReLU中（見圖9-9）。