使用tf.estimator時,如果調用 Estimator 的 evaluate 方法,則 model_fn 會收到 mode = ModeKeys.EVAL。在這種情況下,模型函數必須返回一個包含模型損失和一個或多個指標(可選)的 tf.estimator.EstimatorSpec。雖然返回指標是可選的,但大多數自定義 Estimator 至少會返回一個指標。TensorFlow 提供一個指標模塊 tf.metrics 來計算常用指標。
幾個常用的指標
這些可能只針對二分類
文檔表示標籤和預測都將轉換爲bool,因此它只涉及二進制分類。也許有可能對這些例子進行熱門編碼,它會起作用嗎?但不確定這一點。[Tensorflow中多類分類的類精度和召回率?]
accuracy(...)
: Calculates how often predictions
matches labels
.
The accuracy
function creates two local variables, total
and count
that are used to compute the frequency with which predictions
matches labels
. This frequency is ultimately returned as accuracy
: an idempotent operation that simply divides total
by count
.
auc(...)
: Computes the approximate AUC via a Riemann sum.
average_precision_at_k(...)
: Computes average precision@k of predictions with respect to sparse labels.
precision(...)
: Computes the precision of the predictions with respect to the labels. 準確率。tf.metrics.accuracy
函數會將我們的預測值與真實值進行比較,即與輸入函數提供的標籤進行比較。tf.metrics.accuracy
函數要求標籤和預測具有相同的形狀。
precision_at_k(...)
: Computes precision@k of the predictions with respect to sparse labels.
recall(...)
: Computes the recall of the predictions with respect to the labels.
recall_at_k(...)
: Computes recall@k of the predictions with respect to sparse labels.
[評估]
初始化
這些函數創建的都是local variables,直接初始化時需要使用sess.run(tf.local_variables_initializer())而不是tf.global_variables_initializer()。不初始化可能出錯:Attempting to use uninitialized value total_confusion_matrix。
參數
1 如果輸出的是序列label(如ner模型),則一般需要使用mask。[Tensorflow:tensor變換]
2 對於分類模型,
2.1 計算precission、recall時,pred_ids需要是one-hot形式,如
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]],
[tensorflow – 如何正確使用tf.metrics.accuracy?]
note:
1 當然對比的pred_ids不能是有負值的logits,否則出錯[`predictions` contains negative values] # [Condition x >= 0 did not hold element-wise:] [x (Reshape_2:0) = ] [0 -6 3...]。
2 非要改成非one-hot形式,如果argmax維度搞錯沒寫或0,輸入(batch_size, num_labels),輸出本應是(batch_size,),變成了輸出(num_labels,),一般如果num_labels>batch_size不會報錯,<則報錯“(batch_size, num_labels) tf_metircs [`labels` out of bound] [Condition x < y did not hold element-wise:]”,但是兩者都是錯誤的。
2.2 計算acc、auc(這個不清楚原理)時則不需要這種轉換,直接輸入即可。
多類分類的測試
計算precission、recall時,pred_ids需要是one-hot形式,如
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]],
經大規模測試,發現其計算實際上是micro平均,即precission=recall=acc;同時自帶的這種等價於使用下面提到的多分類指標評價tf.metrics.accuracy(labels=labels, predictions=pred_ids)等價於tf_metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(pred_ids,1))。
返回值
以accuracy的返回值爲例:
accuracy: A Tensor representing the accuracy, the value of total divided by count. 準確性調用不會使用新輸入更新度量標準,它只使用兩個局部變量返回值。(具體意思看示例1就ok了)
update_op: An operation that increments the total and count variables appropriately and whose value matches accuracy.
Multi-class metrics for Tensorflow: tf_metrics
precision(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro'):
參數:
labels : Tensor of tf.int32 or tf.int64
The true labels 輸入爲shape=(batch,)的非one-hot的labels列表。
predictions : Tensor of tf.int32 or tf.int64
The predictions, same shape as labels
num_classes : int
The number of classes
pos_indices : list of int, optional
The indices of the positive classes, default is all
weights : Tensor of tf.int32, optional
Mask, must be of compatible shape with labels
average : str, optional
'micro': counts the total number of true positives, false
positives, and false negatives for the classes in
`pos_indices` and infer the metric from it.
'macro': will compute the metric separately for each class in
`pos_indices` and average. Will not account for class
imbalance.
'weighted': will compute the metric separately for each class in
`pos_indices` and perform a weighted average by the total
number of true labels for each class.
recall(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')
f1(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')
輸入如果是one-hot形式,需要轉換成預測標籤類別
acc, acc_op = tf_metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(logits,1))
示例
示例1
label_ids = tf.constant([[3, 1, 5]])
pred_ids = tf.constant([[3, 2, 5]])
acc, acc_op = tf.metrics.accuracy(label_ids, pred_ids)
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print('[total, count]:', sess.run(stream_vars))
print(acc.eval()) # 只使用兩個局部變量(此時未更新爲0)返回值
print(acc_op.eval())
print('[total, count]:', sess.run(stream_vars))
print(acc.eval()) # 只使用兩個局部變量(此時已更新非0)返回值[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>, <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]
[total, count]: [0.0, 0.0]
0.0
0.6666667
[total, count]: [2.0, 3.0]
0.6666667
[tensorflow – 如何正確使用tf.metrics.accuracy?]
[深入理解TensorFlow中的tf.metrics算子]
示例2
# Compute evaluation metrics.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(logits,1))
示例3:多分類
label_ids = tf.constant([[0, 0, 0, 1],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]])
pred_ids = tf.constant([[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
num_labels = label_ids.shape[1]
label_arg_ids = tf.argmax(label_ids, 1)
pred_arg_ids = tf.argmax(pred_ids, 1)
# _, tp_op = tf.metrics.true_positives(label_ids, pred_ids)
# _, fp_op = tf.metrics.false_positives(label_ids, pred_ids)
_, acc_op = tf.metrics.precision(label_ids, pred_ids)
_, acc_op1 = tf.metrics.accuracy(label_arg_ids, pred_arg_ids)
_, pre_op = tf.metrics.precision(label_ids, pred_ids)
# _, pre_op1 = tf.metrics.precision(label_arg_ids, pred_arg_ids)
_, rec_op = tf.metrics.recall(label_ids, pred_ids)
# _, rec_op1 = tf.metrics.recall(label_arg_ids, pred_arg_ids)
# _, pre_op_ = tf_metrics.precision(label_ids, pred_ids, num_labels)
_, pre_op1_ = tf_metrics.precision(label_arg_ids, pred_arg_ids, num_labels, average='macro')
# _, rec_op_ = tf_metrics.recall(label_ids, pred_ids, num_labels)
_, rec_op1_ = tf_metrics.recall(label_arg_ids, pred_arg_ids, num_labels, average='macro')
_, f1_op1_ = tf_metrics.f1(label_arg_ids, pred_arg_ids, num_labels, average='macro')
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print(label_arg_ids.eval())
print(pred_arg_ids.eval())
# print(tp_op.eval()) # 2
# print(fp_op.eval()) # 3
print('acc_op:', acc_op.eval())
print('acc_op1:', acc_op1.eval())
print('pre_op:', pre_op.eval())
# print('pre_op1:', pre_op1.eval()) # 1.0
print('rec_op:', rec_op.eval())
# print('rec_op1:', rec_op1.eval()) # 0.5
# print(pre_op_.eval()) # 0.7
print('pre_op1_:', pre_op1_.eval())
# print(rec_op_.eval()) # 0.7
print('rec_op1_:', rec_op1_.eval())
print('f1_op1_:', f1_op1_.eval())
[3 2 0 1 1]
[3 1 0 0 0]
2.0
3.0
acc_op: 0.4
acc_op1: 0.4
pre_op: 0.4
pre_op1: 1.0
rec_op: 0.4
rec_op1: 0.5
0.7
pre_op1_: 0.33333334
0.7
rec_op1_: 0.5
f1_op1_: 0.375
其它方法及示例
計算softmax輸出的準確度
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def evaluation(sess, outputs, labels):
correct = tf.nn.in_top_k(outputs, labels, 1)
print(sess.run(correct))
return tf.reduce_sum(tf.cast(correct, tf.int32))
with tf.Graph().as_default():
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a = evaluation(sess, [[0.8, 0.1, 0.1], [0.2, 0.6, 0.2], [0.7, 0.1, 0.2]], [0, 1, 2])
print(sess.run(a))
from: -柚子皮-
ref: