Tensorflow:分類模型評估

        使用tf.estimator時,如果調用 Estimator 的 evaluate 方法,則 model_fn 會收到 mode = ModeKeys.EVAL。在這種情況下,模型函數必須返回一個包含模型損失和一個或多個指標(可選)的 tf.estimator.EstimatorSpec。雖然返回指標是可選的,但大多數自定義 Estimator 至少會返回一個指標。TensorFlow 提供一個指標模塊 tf.metrics 來計算常用指標。

幾個常用的指標

這些可能只針對二分類

        文檔表示標籤和預測都將轉換爲bool,因此它只涉及二進制分類。也許有可能對這些例子進行熱門編碼,它會起作用嗎?但不確定這一點。[Tensorflow中多類分類的類精度和召回率?]

accuracy(...): Calculates how often predictions matches labels

        The accuracy function creates two local variables, total and count that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as accuracy: an idempotent operation that simply divides total by count.

auc(...): Computes the approximate AUC via a Riemann sum.

average_precision_at_k(...): Computes average precision@k of predictions with respect to sparse labels.

precision(...): Computes the precision of the predictions with respect to the labels. 準確率。tf.metrics.accuracy 函數會將我們的預測值與真實值進行比較,即與輸入函數提供的標籤進行比較。tf.metrics.accuracy 函數要求標籤和預測具有相同的形狀。

precision_at_k(...): Computes precision@k of the predictions with respect to sparse labels.

recall(...): Computes the recall of the predictions with respect to the labels.

recall_at_k(...): Computes recall@k of the predictions with respect to sparse labels.

[Module: tf.metrics]

[評估]

初始化

        這些函數創建的都是local variables,直接初始化時需要使用sess.run(tf.local_variables_initializer())而不是tf.global_variables_initializer()。不初始化可能出錯:Attempting to use uninitialized value total_confusion_matrix。

參數

1 如果輸出的是序列label(如ner模型),則一般需要使用mask。[Tensorflow:tensor變換]

2 對於分類模型,

2.1 計算precission、recall時,pred_ids需要是one-hot形式,如

labels = [[0, 1, 0],
          [1, 0, 0],
          [0, 0, 1]],

[tensorflow – 如何正確使用tf.metrics.accuracy?]

[Tensorflow踩坑記之tf.metrics]

note: 

        1 當然對比的pred_ids不能是有負值的logits,否則出錯[`predictions` contains negative values] # [Condition x >= 0 did not hold element-wise:] [x (Reshape_2:0) = ] [0 -6 3...]。

        2 非要改成非one-hot形式,如果argmax維度搞錯沒寫或0,輸入(batch_size, num_labels),輸出本應是(batch_size,),變成了輸出(num_labels,),一般如果num_labels>batch_size不會報錯,<則報錯“(batch_size, num_labels) tf_metircs [`labels` out of bound] [Condition x < y did not hold element-wise:]”,但是兩者都是錯誤的。

2.2 計算acc、auc(這個不清楚原理)時則不需要這種轉換,直接輸入即可。

多類分類的測試

計算precission、recall時,pred_ids需要是one-hot形式,如

labels = [[0, 1, 0],
          [1, 0, 0],
          [0, 0, 1]],

經大規模測試,發現其計算實際上是micro平均,即precission=recall=acc;同時自帶的這種等價於使用下面提到的多分類指標評價tf.metrics.accuracy(labels=labels,  predictions=pred_ids)等價於tf_metrics.accuracy(labels=tf.argmax(labels, 1),  predictions=tf.argmax(pred_ids,1))。

返回值

以accuracy的返回值爲例:

        accuracy: A Tensor representing the accuracy, the value of total divided by count. 準確性調用不會使用新輸入更新度量標準,它只使用兩個局部變量返回值。(具體意思看示例1就ok了)
        update_op: An operation that increments the total and count variables appropriately and whose value matches accuracy.

Multi-class metrics for Tensorflow: tf_metrics

precision(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro'):
參數:
    labels : Tensor of tf.int32 or tf.int64
        The true labels 輸入爲shape=(batch,)的非one-hot的labels列表。
    predictions : Tensor of tf.int32 or tf.int64
        The predictions, same shape as labels
    num_classes : int
        The number of classes
    pos_indices : list of int, optional
        The indices of the positive classes, default is all
    weights : Tensor of tf.int32, optional
        Mask, must be of compatible shape with labels
    average : str, optional
        'micro': counts the total number of true positives, false
            positives, and false negatives for the classes in
            `pos_indices` and infer the metric from it.
        'macro': will compute the metric separately for each class in
            `pos_indices` and average. Will not account for class
            imbalance.
        'weighted': will compute the metric separately for each class in
            `pos_indices` and perform a weighted average by the total
            number of true labels for each class.

recall(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')
f1(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')

輸入如果是one-hot形式,需要轉換成預測標籤類別

acc, acc_op = tf_metrics.accuracy(labels=tf.argmax(labels, 1),  predictions=tf.argmax(logits,1))

示例

示例1

label_ids = tf.constant([[3, 1, 5]])
pred_ids = tf.constant([[3, 2, 5]])
acc, acc_op = tf.metrics.accuracy(label_ids, pred_ids)
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)

with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    print('[total, count]:', sess.run(stream_vars))
    print(acc.eval())  # 只使用兩個局部變量(此時未更新爲0)返回值
    print(acc_op.eval())
    print('[total, count]:', sess.run(stream_vars))
    print(acc.eval())  # 只使用兩個局部變量(此時已更新非0)返回值[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>, <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]

[total, count]: [0.0, 0.0]
0.0
0.6666667
[total, count]: [2.0, 3.0]
0.6666667

[tensorflow – 如何正確使用tf.metrics.accuracy?]

[深入理解TensorFlow中的tf.metrics算子]

[Tensorflow踩坑記之tf.metrics]

示例2

# Compute evaluation metrics.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),  predictions=tf.argmax(logits,1))

示例3:多分類

label_ids = tf.constant([[0, 0, 0, 1],
                         [0, 0, 1, 0],
                         [1, 0, 0, 0],
                         [0, 1, 0, 0],
                         [0, 1, 0, 0]])
pred_ids = tf.constant([[0, 0, 0, 1],
                        [0, 1, 0, 0],
                        [1, 0, 0, 0],
                        [1, 0, 0, 0],
                        [1, 0, 0, 0]])
num_labels = label_ids.shape[1]
label_arg_ids = tf.argmax(label_ids, 1)
pred_arg_ids = tf.argmax(pred_ids, 1)
# _, tp_op = tf.metrics.true_positives(label_ids, pred_ids)
# _, fp_op = tf.metrics.false_positives(label_ids, pred_ids)

_, acc_op = tf.metrics.precision(label_ids, pred_ids)
_, acc_op1 = tf.metrics.accuracy(label_arg_ids, pred_arg_ids)
_, pre_op = tf.metrics.precision(label_ids, pred_ids)
# _, pre_op1 = tf.metrics.precision(label_arg_ids, pred_arg_ids)
_, rec_op = tf.metrics.recall(label_ids, pred_ids)
# _, rec_op1 = tf.metrics.recall(label_arg_ids, pred_arg_ids)

# _, pre_op_ = tf_metrics.precision(label_ids, pred_ids, num_labels)
_, pre_op1_ = tf_metrics.precision(label_arg_ids, pred_arg_ids, num_labels, average='macro')
# _, rec_op_ = tf_metrics.recall(label_ids, pred_ids, num_labels)
_, rec_op1_ = tf_metrics.recall(label_arg_ids, pred_arg_ids, num_labels, average='macro')
_, f1_op1_ = tf_metrics.f1(label_arg_ids, pred_arg_ids, num_labels, average='macro')

stream_vars = [i for i in tf.local_variables()]
print(stream_vars)

with tf.Session() as sess:
    sess.run(tf.local_variables_initializer())
    print(label_arg_ids.eval())
    print(pred_arg_ids.eval())
    # print(tp_op.eval())  # 2
    # print(fp_op.eval())  # 3

    print('acc_op:', acc_op.eval())
    print('acc_op1:', acc_op1.eval())
    print('pre_op:', pre_op.eval())
    # print('pre_op1:', pre_op1.eval())  # 1.0
    print('rec_op:', rec_op.eval())
    # print('rec_op1:', rec_op1.eval())  # 0.5

    # print(pre_op_.eval()) # 0.7
    print('pre_op1_:', pre_op1_.eval())
    # print(rec_op_.eval()) # 0.7
    print('rec_op1_:', rec_op1_.eval())
    print('f1_op1_:', f1_op1_.eval())

[3 2 0 1 1]
[3 1 0 0 0]
2.0
3.0
acc_op: 0.4
acc_op1: 0.4
pre_op: 0.4
pre_op1: 1.0
rec_op: 0.4
rec_op1: 0.5
0.7
pre_op1_: 0.33333334
0.7
rec_op1_: 0.5
f1_op1_: 0.375

-柚子皮-

 

 

其它方法及示例

計算softmax輸出的準確度

import tensorflow as tf
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


def evaluation(sess, outputs, labels):
    correct = tf.nn.in_top_k(outputs, labels, 1)
    print(sess.run(correct))
    return tf.reduce_sum(tf.cast(correct, tf.int32))


with tf.Graph().as_default():
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    a = evaluation(sess, [[0.8, 0.1, 0.1], [0.2, 0.6, 0.2], [0.7, 0.1, 0.2]], [0, 1, 2])
    print(sess.run(a))

from: -柚子皮-

ref:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章