5-8以前爲tensorflow2.0 ,5-8以後爲tensorflow1.0
- 什麼是交叉熵?
交叉熵:跟信息熵類似,值越小越好,,p(x)爲標籤,q(x)爲預測值(決策樹的
信息熵 = ,p(x)爲每個類別的比例),其實邏輯迴歸的損失函數就是交叉熵,
吳恩達機器學習的數字分類神經網絡使用的損失函數=ry.*log(a3) + (1 - ry).*log(1 - a3); ry爲label轉成的one_hot,
a3爲最後一層輸出logits使用sigmoid後的概率值。但是我從這個https://blog.csdn.net/tsyccnh/article/details/79163834博客上看到,如果是單分類問題即每張圖片只有一個數字(顯然吳恩達的例子中只有一個數字),那麼應該將logits用softmax先轉爲概率值,然後用交叉熵計算。如果是多分類問題(一個圖片中需要識別多個數字),將logits先用sigmoid函數
轉爲概率值,然後對每個節點先做單節點交叉熵計算,如,然後單個樣本的loss = loss_{unit1} + loss_{unit2} +… ,就是吳恩達神經網絡手寫識別的例子 - 爲什麼softmax要和交叉熵聯合使用,爲什麼softmax不和mse結合?
https://blog.csdn.net/u010365819/article/details/87937183,大致是softmax要和mse結合後的損失函數是是非凸函數
5-3 feature_columns
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
import os
import sys
import time
import sklearn
from tensorflow import keras
import tensorflow as tf
print(tf.__version__)
print(sys.version_info)
for module in mpl, np, pd, sklearn, tf, keras:
print(module.__name__, module.__version__)
5.3.1 讀取數據
# 泰坦尼克數據集含義 = {'survived':'預測的值,有無獲救', 'sex', 'age', 'n_siblings_spouses':'兄弟姐妹及配偶的人數', 'parch':'父母及孩子人數', 'fare':'船票費用','class':'倉位(高等/中等/低等)', 'deck':'船艙的位置', 'embark_town':'從哪個港口出發', 'alone':'是否獨自一人'}
train_df = pd.read_csv('train.csv')
eval_df = pd.read_csv('eval.csv')
y_train = train_df.pop('survived')
y_eval = eval_df.pop('survived')
train_df.head(3)
5.3.2 離散和連續特徵分類構建用於後續one_hot編碼映射
categorical_columns = ['sex', 'parch', 'n_siblings_spouses', 'class', 'deck', 'embark_town', 'alone'] # 離散特徵
numeric_columns = ['age', 'fare'] # 連續特徵
feature_columns = []
for categorical_column in categorical_columns:
vocab = train_df[categorical_column].unique()
print(categorical_column, vocab)
feature_columns.append(
tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_vocabulary_list(categorical_column, vocab)))
for numeric_column in numeric_columns:
feature_columns.append(
tf.feature_column.numeric_column(numeric_column, dtype=tf.float32))
5.3.3 構建數據集
def make_dataset(data_df, label_df, epochs=10, shuffle=True, batch_size=32):
dataset = tf.data.Dataset.from_tensor_slices( (train_df.to_dict('list'), label_df))
if shuffle :
dataset = dataset.shuffle(10000)
dataset = dataset.repeat(epochs).batch(batch_size)
return dataset
train_dataset = make_dataset(train_df, y_train, batch_size=)
5.3.4 DenseFeatures進行one_hot映射
for x,y in train_dataset.take(1):
print(keras.layers.DenseFeatures(feature_columns)(x).numpy())
對上述樣本結果解讀:
- 1
# 13.8625是fare特徵,在5.3.2中直接tf.feature_column.numeric_column,無需進行one_hot編碼映射,在原始數據查找只有一條記錄
train_df[train_df['fare'] == 13.8625]
- 2
# 使用pprint查看特徵的先後順序爲['age','alone','class','deck','embark_town','fare','n_siblings_spouses','parch','sex']
import pprint
for x,y in train_dataset.take(1):
pprint.pprint(x)
- 3
5-4 keras_to_estimater
model = keras.models.Sequential([
keras.layers.DenseFeatures(feature_columns),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.SGD(lr=0.01),
metrics = ['accuracy'])
train_dataset = make_dataset(train_df, train_df, epochs=100) # 當train_df爲數值型時用sparse_categorical_crossentropy,他會將train_df先轉成one-hot編碼,是one-hot編碼時用categorical_crossentropy
eval_dataset = make_dataset(eval_df, y_eval, epochs=1, shuffle=False)
history = model.fit(train_dataset, validation_data=eval_dataset,
steps_per_epoch = len(train_df)//32,
validation_steps = len(eval_df) // 32,
epochs = 100)
使用estimator
estimator = keras.estimator.model_to_estimator(model)
estimator.train(input_fn = lambda: make_dataset(train_df, y_train, epochs=100)) # 該版本有錯誤,應該是對那些轉爲feature的數據集的數據不兼容
# input_fn的函數必須返回(features, label)元祖,但不能有參數
5-5 預定義estimator
5.5.1 BaselineClassifier 分類器,相當於隨機猜測,根據標籤值隨機猜測
output_dir = 'baseline_model'
if not os.path.exists(output_dir):
os.mkdir(output_dir)
baseline_estimator = tf.compat.v1.estimator.BaselineClassifier(model_dir=output_dir,n_classes=2)
baseline_estimator.train(input_fn = lambda : make_dataset(train_df, y_train, epochs=100))
# baseline_estimator.train( input_fn = fun('t') )
baseline_estimator.evaluate(input_fn=lambda : make_dataset(eval_df, y_eval, epochs=1, shuffle=False, batch_size=20))
5.5.2 LinearClassifier
linear_output_dir = 'linear_model'
if not os.path.exists(linear_output_dir):
os.mkdir(linear_output_dir)
linear_estimator = tf.estimator.LinearClassifier(model_dir=linear_output_dir,
n_classes=2,
feature_columns=feature_columns)
linear_estimator.train(input_fn = lambda : make_dataset(train_df, y_train, epochs=100))
linear_estimator.evaluate(input_fn=lambda :make_dataset(eval_df, y_eval, epochs=1, shuffle=False))
5.5.3 DNNClassifier
dnn_output_dir = 'dnn_model'
if not os.path.exists(dnn_output_dir):
os.mkdir(dnn_output_dir)
dnn_estimator = tf.estimator.DNNClassifier(model_dir=dnn_output_dir,
n_classes=2,
feature_columns=feature_columns,
hidden_units = [128,128],
activation_fn=tf.nn.relu,
optimizer='Adam')
dnn_estimator.train(input_fn = lambda : make_dataset(train_df, y_train, epochs=100))
dnn_estimator.evaluate(input_fn=lambda :make_dataset(eval_df, y_eval, epochs=1, shuffle=False))
5-6 交叉特徵實戰
# 離散和連續特徵分類構建用於後續的映射
categorical_columns = ['sex', 'parch', 'n_siblings_spouses', 'class', 'deck', 'embark_town', 'alone'] # 離散特徵
numeric_columns = ['age', 'fare'] # 連續特徵
feature_columns = []
for categorical_column in categorical_columns:
vocab = train_df[categorical_column].unique()
print(categorical_column, vocab)
feature_columns.append(
tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_vocabulary_list(categorical_column, vocab)))
for numeric_column in numeric_columns:
feature_columns.append(
tf.feature_column.numeric_column(numeric_column, dtype=tf.float32))
# 添加交叉特徵
# age : [1,2,3] gender = ['male', 'female']
# 生成交叉特徵 : [(1.'male'), (1,'female'), ..., (3,'female')]
# hash_bucket_size 作用,因爲交叉特徵的量可能太多,比如2個原始特徵size=100,那麼交叉特徵的size=10000個,作爲輸入
# 太大,需要進行減少 ,減少過程 hash(10000 value) % 100, 這樣每次使用的交叉特徵就只有100個
feature_columns.append(
tf.feature_column.indicator_column(
tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)))
下面再用BaselineClassifier,LinearClassifier,DNNClassifier進行訓練,訓練過程和5.5一致。
爲什麼使用交叉特徵: 在wide&deep模型中,離散特徵使用交叉特徵能更細節的記住一個人
結論: 發現dnn模型增加了交叉特徵後,accuracy還降低了,base方法沒變化,linear方法升高,說明增加交叉特徵
對提升每個模型的作用不同。如果想在dnn模型中使用交叉特徵,那就需要使用wide&deep模型,因爲單純加入只會
減少精確率
5-8 使用tf1.0計算圖構建
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all, y_train_all),(x_test, y_test) = fashion_mnist.load_data()
x_valid, x_train = x_train_all[:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[:5000], y_train_all[5000:]
# 數據歸一化
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(
x_train.astype(np.float32).reshape(-1,1)).reshape(-1, 28 * 28)
x_valid_scaled = scaler.transform(
x_valid.astype(np.float32).reshape(-1,1)).reshape(-1, 28 * 28)
x_test_scaled = scaler.transform(
x_test.astype(np.float32).reshape(-1,1)).reshape(-1, 28 * 28)
x_train_scaled.shape, y_train.shape
((55000, 784), (55000,))
5.8.1 使用x,y 佔位符構建網絡
hidden_units = [100, 100]
class_num = 10
x = tf.placeholder(tf.float32, [None, 28 * 28]) # None代表batch_size的大小,可以輸入任意值
y = tf.placeholder(tf.int64, [None]) # None代表batch_size大小,不知道batch_size多少,先佔位
# 開始構建網絡層
# 輸出層到隱藏層構建
input_for_next_layer = x
for hidden_unit in hidden_units:
Input_for_next_layer = tf.layers.dense(input_for_next_layer, hidden_unit, activation=tf.nn.relu)
# logits: 最後一層輸出,還沒用激活函數
logits = tf.layers.dense(input_for_next_layer, class_num)
# loss: 1. 使用softmax將logits轉換爲概率prob ; 2. labels -> one_hot;3. 計算交叉熵corss_entropy損失函數
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits) # 將labels轉成one_hot編碼
# 過去accuracy
prediction = tf.argmax(logits, axis=1) # 求出每一行的最大值對應的索引,若logits爲1位數組如[1,2,3],那麼axis只能等於0,
# 因爲一維數組默認是認爲列,這裏的logits是二維
correct_predition = tf.equal(prediction, y)
accuracy = tf.reduce_mean(tf.cast(correct_predition, tf.float64))
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss) # 1e-3爲learning_rate
init = tf.global_variables_initializer()
batch_size = 20
epochs = 10
train_steps_for_epoch = len(x_train_scaled) // batch_size
def eval_with_sess(sess, x, y, accuracy, images, labels, batch_size):
eval_steps = len(images) // batch_size
eval_accuracies = []
for step in range(eval_steps):
batch_data = images[step*batch_size: (step+1)*batch_size]
batch_label = labels[step*batch_size: (step+1)*batch_size]
accuracy_val = sess.run(accuracy, feed_dict={
x: batch_data,
y: batch_label
})
eval_accuracies.append(accuracy_val)
return np.mean(eval_accuracies)
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
for step in range(train_steps_for_epoch):
batch_data = x_train_scaled[step*batch_size: (step+1)*batch_size]
batch_label = y_train[step*batch_size: (step+1)*batch_size]
loss_val, accuracy_val, _ = sess.run(
[loss, accuracy, train_op], feed_dict = {
x: batch_data,
y: batch_label
})
print("\r [Train] epoch: %d, step: %d , loss: %3.5f , accuracy %3.5f" %
(epoch, step, loss_val, accuracy_val), end="") # mac下\r與換行符\r衝突
valid_accuracy = eval_with_sess(sess, x, y, accuracy, x_valid_scaled, y_valid, batch_size)
print("\t[Valid] acc : %2.2f " % (valid_accuracy) )
5-9 tf.dataset 取消x,y佔位符構建網絡
# 避免後面類型錯誤
y_train = np.asarray(y_train, dtype=np.int64)
y_valid = np.asarray(y_valid, dtype=np.int64)
y_test = np.asarray(y_test, dtype=np.int64)
def make_dataset(images, labels, epochs, batch_size, shuffle=True):
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
if shuffle :
dataset = dataset.shuffle(10000)
dataset = dataset.repeat(epochs).batch(batch_size)
return dataset
batch_size = 20
epochs = 10
hidden_units = [100, 100]
class_num = 10
5.9.1 make_one_shot_iterator 構建網絡,無法在session中feed訓練集及驗證集
# make_one_shot_iterator
# 1. 自動初始化 2. 不能被重新初始化 make_initializable_interator能重新初始化加載訓練集驗證集數據
dataset = make_dataset(x_train_scaled, y_train, epochs, batch_size)
dataset_iter = dataset.make_one_shot_iterator()
x, y = dataset_iter.get_next()
# x = tf.placeholder(tf.float32, [None, 28 * 28]) # x,y 已經生成
# y = tf.placeholder(tf.int64, [None])
input_for_next_layer = x
for hidden_unit in hidden_units:
Input_for_next_layer = tf.layers.dense(input_for_next_layer, hidden_unit, activation=tf.nn.relu)
logits = tf.layers.dense(input_for_next_layer, class_num)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
prediction = tf.argmax(logits, axis=1)
correct_predition = tf.equal(prediction, y)
accuracy = tf.reduce_mean(tf.cast(correct_predition, tf.float64))
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss) # 1e-3爲learning_rate
init = tf.global_variables_initializer()
train_steps_per_epoch = len(x_train_scaled) // batch_size
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
for step in range(train_steps_per_epoch):
loss_val, accuracy_val, _ = sess.run( [loss, accuracy, train_op])
print("\r [Train] epoch: %d, step: %d , loss: %3.5f , accuracy %3.5f" %
(epoch, step, loss_val, accuracy_val), end="") # mac下\r與換行符\r衝突
print()
5.9.2 make_initializable_interator , 在make_dataset中添加佔位符,可在session中feed訓練集及驗證集
epochs = 10
batch_size = 128
images_placeholder = tf.placeholder(tf.float32, [None, 28*28])
labels_placeholder = tf.placeholder(tf.int64, (None,))
dataset = make_dataset(images_placeholder, labels_placeholder, epochs=10, batch_size=batch_size)
dataset_iter = dataset.make_initializable_iterator()
x, y = dataset_iter.get_next()
hidden_units = [100, 100]
class_num = 10
input_for_next_layer = x
for hidden_unit in hidden_units:
Input_for_next_layer = tf.layers.dense(input_for_next_layer, hidden_unit, activation=tf.nn.relu)
logits = tf.layers.dense(input_for_next_layer, class_num)
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=logits)
prediction = tf.argmax(logits, axis=1)
correct_predition = tf.equal(prediction, y)
accuracy = tf.reduce_mean(tf.cast(correct_predition, tf.float64))
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss) # 1e-3爲learning_rate
init = tf.global_variables_initializer()
train_steps_per_epoch = len(x_train_scaled) // batch_size
valid_steps = len(x_valid_scaled) // batch_size
def eval_with_sess(sess, images, labels ):
sess.run(dataset_iter.initializer, feed_dict={
images_placeholder: images,
labels_placeholder: labels,
})
return np.mean([sess.run(accuracy) for step in range(valid_steps)])
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
sess.run(dataset_iter.initializer, feed_dict={
images_placeholder: x_train_scaled,
labels_placeholder: y_train
})
for step in range(train_steps_per_epoch):
loss_val, accuracy_val, _ = sess.run([loss, accuracy, train_op])
print("\r [Train] epoch: %d, step: %d , loss: %3.5f , accuracy %3.5f" %
(epoch, step, loss_val, accuracy_val), end="") # mac下\r與換行符\r衝突
valid_accuracy = eval_with_sess(sess, x_valid_scaled, y_valid)
print("\t valid acc: %3.5f"% valid_accuracy)
5-11 estimator
train_df = pd.read_csv('train.csv')
eval_df = pd.read_csv('eval.csv')
y_train = train_df.pop('survived')
y_eval = eval_df.pop('survived')
train_df.head(3)
# 離散和連續特徵分類構建用於後續的映射
categorical_columns = ['sex', 'parch', 'n_siblings_spouses', 'class', 'deck', 'embark_town', 'alone'] # 離散特徵
numeric_columns = ['age', 'fare'] # 連續特徵
feature_columns = []
for categorical_column in categorical_columns:
vocab = train_df[categorical_column].unique()
print(categorical_column, vocab)
feature_columns.append(
tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_vocabulary_list(categorical_column, vocab)))
for numeric_column in numeric_columns:
feature_columns.append(
tf.feature_column.numeric_column(numeric_column, dtype=tf.float32))
# 構建數據集
def make_dataset(data_df, label_df, epochs=10, shuffle=True, batch_size=32):
dataset = tf.data.Dataset.from_tensor_slices( (data_df.to_dict('list'), label_df))
if shuffle :
dataset = dataset.shuffle(10000)
dataset = dataset.repeat(epochs).batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
def model_fn(features, labels, mode, params):
# model 運行狀態 PREDICT, EVAL,TRAIN
input_for_next_layer = tf.feature_column.input_layer(
features, params['feature_columns'])
for n_unit in params['hidden_units']:
input_for_next_layer = tf.layers.dense(input_for_next_layer, units=n_unit, activation=tf.nn.relu)
logits = tf.layers.dense(input_for_next_layer, params['n_classes'], activation=None)
predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
pretictions = {
"class_ids": predicted_classes[:, tf.newaxis],
"probabilities": tf.nn.softmax(logits),
"logits": logits
}
return tf.estimator.EstimatorSpec(mode, predictions=pretictions)
loss= tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
accuracy = tf.metrics.accuracy(labels=labels, predictions= predicted_classes, name='acc_op') # 這裏的accuracy會累積,不用手寫求和後求平均
metrics = {"accuracy": accuracy}
if mode == tf.estimator.ModeKeys.EVAL:
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=metrics)
optimizer = tf.train.AdamOptimizer() # 在梯度下降時自適應調整learning_rate,不會導致梯度很大時的學習步長太大,性能差表現好,表現好:不太會發散
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
output_dir = "tf_1.0_customized_estimator"
if not os.path.isdir(output_dir): os.mkdir(output_dir)
estimator = tf.estimator.Estimator(
model_fn = model_fn,
model_dir= output_dir,
params = {
"feature_columns": feature_columns,
"hidden_units": [100, 100],
"n_classes": 2
}
)
estimator.train(input_fn = lambda: make_dataset(train_df, y_train, epochs=100))
estimator.evaluate(lambda : make_dataset(eval_df, y_eval, epochs=1))
5-12 tf1.0與tf2.0區別
靜態圖和動態圖
- tf1.0: Sess, feed_dict, placeholder被移除
- tf1.0: make_one_shot(initializavble)_iterator被移除
- tf2.0: eager mode, @tf.function, AutoGraph
- tf.function與AutoGraph
- 性能好
- 可以導出導入爲SavedModel(如:for/while->tf.while_loop, uf->tf.cond, for _ in dataset -> dataset.reduce)
API變動
- tf有2000個API,500個在根空間下
- 一些空間被建立了但是沒有包含所有的API(如tf.round沒有在tf.math下)
- 有些在跟空間下,但是很少被使用tf.zeta
- 有些經常使用,不在根空間下tf.manip
- 有些空間層次太深
- tf.saved_model.signature_constants.CLASSIFY_INPUTS -> tf.saved_model.CLASSIFY_INPUTS
- tf2.0的API大多放在tf.keras下