分類器評價參數之混淆矩陣

在機器學習中,常常需要對學習算法性能進行評估,自然需要建立一些評估準則。

參考http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html中的描述,一個比較簡單的描述是混淆矩陣(confusion matrix)。它是這樣定義的:

The entries in the confusion matrix have the following meaning in the context of our study:

  • a is the number of correct predictions that an instance is negative,真實負樣本被正確分類爲負樣本的數目
  • b is the number of incorrect predictions that an instance is positive,真實負樣本被錯誤分類爲正樣本的數目
  • c is the number of incorrect of predictions that an instance negative, 真實正樣本被錯誤分類爲負樣本的數目
  • d is the number of correct predictions that an instance is positive.實正樣本被正確分類爲正樣本的數目
  Predicted
Negative Positive
Actual Negative a b
Positive c d

從這個混淆矩陣可以導出以下幾個評價參數:

Several standard terms have been defined for the 2 class matrix:

  • The accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation:分類準確度,就是正負樣本分別被正確分類的概率

cm1
[1]

  • The recall or true positive rate (TP) is the proportion of positive cases that were correctly identified, as calculated using the equation:召回率,就是正樣本被識別出的概率

cm2
[2]

  • The false positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated usingthe equation:虛警率,負樣本被錯誤分爲正樣本的概率

cm3
[3]

  • The true negative rate (TN) is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation:

cm4
[4]

  • The false negative rate (FN) is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation:漏警率,正樣本被錯誤分爲負樣本的概率

cm5
[5]

  • Finally, precision (P) is the proportion of the predicted positive cases that were correct, as calculated using the equation:精確度,即分類結果爲正樣本的情況真實性程度

cm6
[6]

The accuracy determined using equation 1 may not be an adequate performance measure when the number of negative cases is much greater than the number of positive cases (Kubat et al., 1998). Suppose there are 1000 cases, 995 of which are negative cases and 5 of which are positive cases. If the system classifies them all as negative, the accuracy would be 99.5%, even though the classifier missed all positive cases.

上一段話的意思是:使用accuracy評估分類器,效果可能不那麼好,特別是負樣本數目佔較大比例時。

 Other performance measures account for this by including TP in a product: for example, geometric mean (g-mean) (Kubat et al., 1998), as defined in equations 7 and 8, and F-Measure (Lewis and Gale, 1994), as defined in equation 9.

cm7
[7]

cm8
[8]

cm9
[9]

In equation 9, b has a value from 0 to infinity and is used to control the weight assigned to TP and P. Any classifier evaluated using equations 7, 8 or 9 will have a measure value of 0, if all positive cases are classified incorrectly.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章