ML：常见判断类模型好坏指标 - 混淆矩阵 & ROC曲线 & AUC & 其他

原創

2020-06-30 11:01

混淆矩阵

前提概念

缩写	全拼	含义
TP	True Positive	预测对了，预测了“Positive”
FN	False Negative	预测错了，预测了“Negetive”
FP	False Positive	预测错了，预测了“Positive”
TN	True Negtive	预测对了，预测了“Negtive”

2.指标定义

指标	定义	备注
Accuracy
Precision		对于模型标记为无误的样本中，它有多大比重是实际上也正确的
Recall / Sensitivity		对于实际上是正确的样本，它有多大比重被模型无误的找出来了
F1 - Score		取值范围是从-到1的。1是最好，0是最差

ROC曲线

Receiver Operating Characteristic Curve / 感受性曲线 / 受试者工作特征曲线

ROC曲线越向左上角凸，其效果越好；

AUC：即ROC曲线下的阴影部分的面积，故不展开；

注：以_score结尾的，值越大说明模型越好，以_error 或_loss结尾的越小越好。

Sklearn示例

by oopcode in stackoverflow.（有改动）

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, roc_auc_score

lr = LogisticRegression()
X = np.random.rand(20, 2)
y = np.random.randint(2, size=20)
lr.fit(X, y)

FP_rate, TP_rate, thresholds = roc_curve(y, lr.predict(X))
print(auc(FP_rate, TP_rate))
# 0.5
print(roc_auc_score(y, lr.predict(X)))
# 0.5

附注：sklearn的评价指标（官网链接）

指标	函数	备注
分类
‘accuracy’	metrics.accuracy_score
‘balanced_accuracy’	metrics.balanced_accuracy_score
‘average_precision’	metrics.average_precision_score
‘brier_score_loss’	metrics.brier_score_loss
‘f1’	metrics.f1_score	用于二分类
'f1_micro	metrics.f1_score
‘f1_macro’	metrics.f1_score
‘f1_weighted’	metrics.f1_score
‘f1_samples’	metrics.f1_score
‘precision’ etc	metrics.precision_score	和 `f1`搭配使用
‘recall’ etc	metrics.recall_score	和 `f1`搭配使用
‘jaccard’ etc	metrics.jaccard_score	和 `f1`搭配使用
‘neg_log_loss’	metrics.log_loss	需要 `predict_proba`支持
‘roc_auc’	metrics.roc_auc_score
聚类
‘adjusted_mutual_info_score’	metrics.adjusted_mutual_info_score
‘adjusted_rand_score’	metrics.adjusted_rand_score
‘completeness_score’	metrics.completeness_score
‘fowlkes_mallows_score’	metrics.fowlkes_mallows_score
‘homogeneity_score’	metrics.homogeneity_score
‘mutual_info_score’	metrics.mutual_info_score
‘normalized_mutual_info_score’	metrics.normalized_mutual_info_score
‘v_measure_score’	metrics.v_measure_score
回归
‘explained_variance’	metrics.explained_variance_score
‘r2’	metrics.r2_score
‘max_error’	metrics.max_error
‘neg_mean_absolute_error’	metrics.mean_absolute_error
‘neg_mean_squared_error’	metrics.mean_squared_error
‘neg_mean_squared_log_error’	metrics.mean_squared_log_error
‘neg_median_absolute_error’	metrics.median_absolute_error

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

ML：常见判断类模型好坏指标 - 混淆矩阵 & ROC曲线 & AUC & 其他

混淆矩阵

ROC曲线

Sklearn示例

python筆記：multiprocessing 函數apply和apply_async有什麼區別？

ML筆記：分類算法之SVM

ML：常見判斷類模型好壞指標 - 混淆矩陣 & ROC曲線 & AUC & 其他

python筆記：df.plot()常見的座標軸的操作，及正常顯示負號

ML：非監督學習之聚類之 1 KMeans聚類（sklearn.cluster.KMeans)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結