模型的性能度量
我們需要比較兩個分類模型和。他們在10個二類(+或-)樣本所組成的測試集上的分類結果如下表格中所示。假設我們更關心正樣本是否能被正確檢測。
Instance | True Class | Scores from | Scores from |
1 | + | 0.73 | 0.61 |
2 | + | 0.69 | 0.03 |
3 | - | 0.44 | 0.68 |
4 | - | 0.55 | 0.31 |
5 | - | 0.67 | 0.45 |
6 | + | 0.47 | 0.09 |
7 | - | 0.08 | 0.38 |
8 | - | 0.15 | 0.05 |
9 | + | 0.45 | 0.01 |
10 | - | 0.35 | 0.04 |
(1)對於分類模型M1,取閾值爲0.5,分別計算分類準確率(accuracy)、查準率(precision)、查全率(recall,又稱真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;
(2)對於分類模型M2,取閾值爲0.5,分別計算分類準確率(accuracy)、查準率(precision)、查全率(recall,又稱真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;並與分類模型比較,分析哪個分類模型在這個測試集上表現更好;
(3)對於分類模型M1,取閾值爲0.2,分別計算分類準確率(accuracy)、查準率(precision)、查全率(recall,又稱真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;並討論當閾值爲0.2或0.5時,哪個分類模型M1的分類結果哪個更好;
(4)試討論是否存在更好的閾值;若存在,請求出最優閾值並說明原因。
答:
(1)
class | - | - | - | - | + | + | - | - | + | + |
| 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 |
TP | 2 | |||||||||
FP | 2 | |||||||||
TN | 4 | |||||||||
FN | 2 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 0.5 | |||||||||
FPR | 1/3 | |||||||||
F-measure | 0.5 |
(2)
class | + | + | - | - | + | - | - | - | + | - |
| 0.01 | 0.03 | 0.04 | 0.05 | 0.09 | 0.31 | 0.38 | 0.45 | 0.61 | 0.68 |
TP | 1 | |||||||||
FP | 1 | |||||||||
TN | 5 | |||||||||
FN | 3 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 0.25 | |||||||||
FPR | 1/6 | |||||||||
F-measure | 1/3 |
TPR,M1>M2,分類模型M1在這個測試集上表現得更好。
(3)
class | - | - | - | - | + | + | - | - | + | + |
| 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 |
TP | 4 | |||||||||
FP | 4 | |||||||||
TN | 2 | |||||||||
FN | 0 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 1 | |||||||||
FPR | 2/3 | |||||||||
F-measure | 2/3 |
TPR=1,閾值爲0.2時結果更好。
(4)
對於模型M1,
class | - | - | - | - | + | + | - | - | + | + |
|
Threshold>= | 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 | 1.0 |
TP | 4 | 4 | 4 | 4 | 4 | 3 | 2 | 2 | 2 | 1 | 0 |
FP | 6 | 5 | 4 | 3 | 2 | 2 | 2 | 1 | 0 | 0 | 0 |
TN | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 5 | 6 | 6 | 6 |
FN | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 2 | 3 | 4 |
accuracy | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.7 | 0.6 | 0.7 | 0.8 | 0.7 | 0.6 |
precision |
|
|
|
|
|
|
|
|
|
|
|
TPR | 1 | 1 | 1 | 1 | 1 | 0.75 | 0.5 | 0.5 | 0.5 | 0.25 | 0 |
FPR | 1 | 5/6 | 2/3 | 0.5 | 1/3 | 1/3 | 1/3 | 1/6 | 0 | 0 | 0 |
F-measure |
|
|
|
|
|
|
|
|
|
|
|
閾值取0.45時最優,此時accuracy = 0.8, TPR=1,FPR=1/3.
對於模型M2,
class | + | + | - | - | + | - | - | - | + | - |
Threshold>= | 0.01 | 0.03 | 0.04 | 0.05 | 0.09 | 0.31 | 0.38 | 0.45 | 0.61 | 0.68 |
TP | 4 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 0 |
FP | 6 | 6 | 6 | 5 | 4 | 4 | 3 | 2 | 1 | 1 |
TN | 0 | 0 | 0 | 1 | 2 | 2 | 3 | 4 | 5 | 5 |
FN | 0 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 |
accuracy | 0.4 | 0.3 | 0.2 | 0.3 | 0.4 | 0.3 | 0.4 | 0.5 | 0.6 | 0.5 |
precision |
|
|
|
|
|
|
|
|
|
|
TPR(recall) | 1 | 0.75 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0.25 | 0 |
FPR | 1 | 1 | 1 | 5/6 | 2/3 | 2/3 | 0.5 | 1/3 | 1/6 | 1/6 |
F-measure |
|
|
|
|
|
|
|
|
|
|
閾值取0.61時最優,此時accuracy = 0.6, TPR = 0.25, FPR = 1/6.