classification

Confusion matrix

One dimension is Predicter values, other is Actual values, To binary classification:

The Actual value

True
False

The Predicted Value

Positive
Negative

和confusion matrix相關的公式如下：

The meaning of TP, TN, FP, FN:

TP(True Positive)
predicted positvie and it’s true
TN(True Negative)
predicted negative and it’s false
FP(False Positive)(Type 1 Error)
predicted positive and it’s false

FP is type I error.
type I error:最後算出的結果落在了原假設的拒絕域，實際上原假設是真的，但是犯了拒絕原假設的錯誤
FN(False Negative)(Type 2 Error)
predicted negative and it’s false

FN is type II error.
type I error:最後算出的結果沒有錯在拒絕域，但是實際上原假設是假的，此時依舊接受了原假設

FP 和 FN 本質是假設檢驗中的Type I error 和 Type II error。

FP 和 FN的例子：
根據樣本值大小繪製曲線：藍色是Positvie，紅色是Negative，從圖中看出樣本之間有一個順序。其預測的過程可以大致將其看成一個排序問題，將樣本排好序，然後選擇一個閾值，將樣本分成Positvie 和 Negative

黑色框選中的藍色×，實際是Positvie的，但是其值落在了拒絕域的範圍，因此它被判定成Negative，屬於FP，Type I error。

當改變分割的閾值，紅框的X依舊是Type I error，並且有一個紅色的Negatvie樣本，落在了Posiive取值範圍內，藍色框中的樣本點屬於FP，Type II error

There is a funny picture is helpful to understand these

Accuracy

Describing the closeness of a measurement to the true value.In binary classification, Accuracy can be indicated by TP, TN, FP, FN:
$Accuracy =\frac{n^{correct}}{n^{total}} = \frac{TP + TV}{TP+TN+FP+FN}$

advantage:

簡單直觀

disadvantages：

樣本比例不均衡，佔比大的樣本往往稱爲影響準確率的最重要的因素
for example：負樣本佔99%, 如果全部預測成負樣本，accuracy = 99%，但是並不能真正反映模型好壞
總體準確率很高，但是可能對於某個類別，正確率很低。

Precision

分類正確的正樣本個數，佔分類器判定爲正樣本個數的比例，Precision主要突出準確性
$Precision = \frac{TP}{TP+FP}$

Recall

分類器正確的樣本個數佔真正樣本個數的比例, Recall 主要突出全面性
$Recall = \frac{TP}{TP+FN}$

Precision 和 Recall 是即矛盾又統一的兩個指標，爲了提高Precision，分類器需要儘量在更有把握時才把樣本預測爲正樣本，但此時往往會因爲過於保守而漏掉很多沒有把握正樣本，導致Recall降低

P-R曲線

橫軸：Recall
縱軸：Precision
曲線上一點代表在某一閾值下，模型將大於該閾值的結果定爲正樣本，小於該閾值的結果定義爲負樣本
整條P-R曲線是通過將閾值從高到底移動而成的
原點附近代表當閾值最大時，模型的精確度和召回率
只用某個點對應的Precision and Recall 是不全面地衡量模型性能，只有通過P-R曲線的整體表現，才能對模型進行更爲全面的評估
若一個學習器的P-R曲線被另一個學習曲線完全包住，則可斷言厚澤性能優於前者，發生交叉，難以一般性斷言兩者孰優孰路劣，若要分出高低，可以比較P-R曲線下的面積，一定程度上表徵了學習器在查準率和查全率上取得相對“雙高”的比例，但是其值計算難度較大
平衡點（Break-Even Point）也是一個綜合考量recall和precision的性能度量。通過度量 recall = precision時的取值(越大越好)

如圖所示，基於BEP的比較，可認爲學習器A優於B

F1 score

除了P-R曲線，F1 socre也能夠綜合地反映一個排序模型的性能。
The F1 score is based on the harmonic mean

Harmonic mean（調和平均值）

$H = \frac{n}{\frac{1}{x_{1}} + \frac{1}{x_{2}} + ...+ \frac{1}{x_{n}}} = \frac{n}{\sum_{i = 1}^{n} \frac{1}{x_{i}}} = (\frac{\sum_{i = 1}^{n} \frac{1}{x_{i}}}{n})^{-1}$

The result is not sensitive to extremely large value.On the other hand, all outliers are ingored.Extremely low values have a significant influence on the result.

F1 score

F1 score is based on precison and recall
$F_{1} = (\frac{Recall^{-1} + Precision^{-1}}{2})^{-1} = 2\frac{Recall \times Precision}{Recall + Precision}$

從圖中可以看出在Precision = 1, Rrecall = 0 或者Precision = 0, Rrecall = 1的時候，F1 score依舊是0，在 Precision = 1, Rrecall = 1的時候取得最大值。

F1 score emphasizes the lowest value.If one of the parameter is small, the second one no longer matter.
If the F1 score is high, both precision and recall of the classifier indicate good results
If the F1 is low, we can not tell whether it has problems with false positive or false negatives

$F\beta$ Score

The formula of $F\beta$ is:
$F_{\beta} = (1+\beta^{2}) \cdot \frac{precision \cdot recall }{(\beta^{2} \cdot precision) + recall} = (\frac{1}{1+\beta ^ {2}} \cdot (\frac{1}{precision} + \frac{\beta^{2}}{recall}))^{-1}$

Because we muptiply only one parameter of the denominator by $\beta$ -squared, we can use $\beta$ to make $KaTeX parse error: Expected 'EOF', got '\F' at position 1: \̲F̲]beta$ more sensitive to low value of either precision or recall.

$\beta = 1$ 度量了precision 對 recall的相對重要性
$\beta > 1$ recall 影響更大， $\frac{\beta^{2}}{recall}$ , 因爲有 $\beta$ 的加成，隨着recall變小而變大的速率加快，因爲相加取倒數的結果往往由較大一方決定量級，所以 $\frac{\beta^{2}}{recall}$ 在相加求和操作中更容易占主導地位。
$0 < \beta < 1$ precision影響更大，原理與 $\beta > 1$ 相反。

當 $\beta = 2$ 時

這個圖坐precision和recall 座標[0, 1]反了，但是依舊能說明問題。從圖中可以看出當recall < 0.2的時候，precision基本失效，F2 score 由recall來主導。

When should we use $F\beta$ score instead of F1 score?

In cases, when one of the metrix (precision or recall) is more important from the business perspective. It depends, how we are going to use classifer and what kind of errors is more problematic.

macro-F1 and micro-F1

有的時候我們可能有多個confusion mtrix

進行措辭訓練/測試
多個數據集上進行訓練/測試，希望估計算法的全局性能
執行多分類任務，每兩兩類別組合都對應一個混淆矩陣

總之當我們希望在n個confusion matrix上綜合考察recall和precision的時候，需要用到macro-F1，micro-F1。

macro-F1
將各個confusion matrix上的 recall 和 precision 計算出來，記作 $(P_{1}, R_{1}), (P_{2}, R_{2})...(P_{n}, R_{n})$ 再計算平均值，得到 macro-P 和 macro-R，進而計算出 macro-F1
$macro-P = \frac{1}{n} \sum_{i = 1}^{n} P_{i} \\ macro-R = \frac{1}{n} \sum_{i = 1}^{n} R_{i}\\ macro-F1 = \frac{2 \times macro-P \times macro-R}{macro-P + macro-R}$
micro-F1
先計算出各個confusion matrix 的TP, FP, TN, FN，然後再求出其對應的平均值 $\overline{TP}$ ， $\overline{FP}$ ， $\overline{TN}$ ， $\overline{FP}$ 。基於這些平均值計算出micro-P，micro-R，和micro-F1

$micro-P = \frac{\overline{TP}}{\overline{TP} + \overline{FP}} \\ micro-R = \frac{\overline{TP}}{\overline{TP} + \overline{FN}} \\ micro-F1 = \frac{2 \times micro-P \times micro-R}{micro-P + micro-R}$

ROC curve

再不同任務中，我們可以根據任務需求來採用不同的截斷點，例如若我們更重視precision，則可以選擇排序中考前的位置來進行階段；若更重視recall，則可再靠後的位置進行截斷。因此排序本身的質量好壞，體現了綜合考慮學習器再不同任務下的“期望泛化誤差”的好壞，ROC 曲線就是從這個角度出發來研究學習器繁華性能的有力工具。

ROC縱軸：（TPR）
$TPR(True Positive Rate)/Recall/Sensitivity = \frac{TP}{TP + FN}$
ROC橫軸：（FPR）
$FPR = \frac{FP}{TN + FP}$

特殊點的詮釋

(0, 0)
將所有點都劃分爲negtive TP = 0， TPR = 0， FP = 0， FPR = 0。
(1, 1)
將所有點全部劃分爲positive，FN = 0， TPR = 1, TN = 0，FPR = 1。
(0, 1)
FN = 0 TPR = 1， FP = 0, FPR = 0. 所以該點代表所有positive 排在所有negtive之前的理想模型

曲線的特殊位置

This is an ideal situation. When two curves don’t overlap at all means model has an ideal measure of separability, It is perfectly able to distinguish between positive class and negative class

When two distributions overlap, we introduce type I and type 2 error.Depending upon the threshold, we can minimize or maximize them.When AUC is 0.7, it means there is 70% chance that model will be able to distinguish between ppositive class ad negative class.

This is the morst situation.When AUC is approximately 0.5, model has no discrimination capacity to distinguish between positive class and negative class.

When AUC is approximately 0, model is actually reciprocating the calsses, It means, model is predicting negative class as a positvie class and vice versa.

ROC曲線的繪製

給定 $m^{+}$ 個正例， $m^{-}$ 個負例，根據學習器進行排序，然後把分類閾值設爲最大，即把所有樣例均預測成反例，此時TPR和FPR 都爲0，在座標(0, 0)點做一個標記，然後，將分類閾值一次設置爲每個樣例的預測值，即一次將每個樣例劃分爲正例。設前一個標記點座標爲(x, y), 當前若爲真，則對應標記點的座標爲 $(x, y+\frac{1}{m_{+}})$ , 若爲假正例，則對應點的座標爲 $(x, y+\frac{1}{m_{-}})$ , 然後連接線段即可

AUC的計算

與P-R 圖類似，若一個學習器的ROC曲線被另一個學習器曲線完全包裹，則可斷言後者性能優於前者，若兩個曲線發生交叉，則難以一般性地斷言兩者孰優孰劣，此時如果一定要進行比較，較爲合理的判斷依據是ROC曲線下面積，即AUC(Area Under ROC Curve)

假定ROC曲線座標上的點爲 $\{(x_{1}, y_{1}), (x_{2}, y_{2}), ..., (x_{n}, y_{n})\}$ , 則AUC可估計爲
$AUC = \frac{1}{2}\sum_{i = 1}^{m - 1}(x_{i - 1} - x_{i})(y_{i} + y_{i+1})$

AUC的統計學解釋

TPR，和FPR分別可以看作一個概率分佈。在假設事件X是學習器計算x的排序得分，在給定一個閾值參數T，在X>T，樣本被分類爲positive的前提下：

TPR
X遵循概率密度函數 $f_{1}(x)$ , 該樣本真是類別是positive的概率。因此true positive rate可以定義爲
$TPR(T) = P(x\ is\ positvie | X> T) = \int_{T}^{+\infty} f_{1}(x)$
FPR
X遵循概率密度函數 $f_{0}(x)$ , 如果該樣本真是類別是negative的概率。因此true positive rate可以定義爲
$FPR(T) = P(x\ is\ negative | X> T) = \int_{T}^{+\infty} f_{0}(x)$

結合之前的內容，可以理解爲，對於每一個模型，TPR和FPR本身是一個概率分佈，通過不斷採樣，運用統計學的知識回推TPR和FPR的分佈。通過得到的TPR和FPR對模型進行評估。

因此AUC除了ROC的面積之外，還有一個probabilistic interpretation(推導太難，不會考。。。略過):

The AUC is the probability the model will score a randomly chosen positive class higher than a randomly 
chosen negative class.

也就是說，隨機選擇一個正樣本和一個負樣本，正樣本得分大於負樣本的概率，就是AUC的值。

如果正樣本和負樣本得分相同，那麼就隨機判定，即正樣本大於負樣本的概率爲1/2，小於負樣本的概率也爲1/2。

也正是因爲他的概率解釋，計算AUC不必先畫出ROC曲線，可以直接根據概率含義，根據樣本點進行計算。在有M個正樣本， N個負樣本的數據集裏，一共有 $M\times N$ 對樣本(一對樣本：一個正例一個負例)統計 $M\times N$ 對樣本里，正樣本的預測概率大於負樣本的預測概率的個數
$AUC = \frac{ \sum_{x^{+} \in D^{+}} \sum_{x^{-}\in D^{-}} I(x^{+}, x^{-})}{M \times N}$
$I(x^{+}, x^{-}) = \left\{\begin{matrix} 1, f(x^{+}) > f(x^{-})\\ 0.5 f(x^{+}) = f(x^{-})\\ 0.5 f(x^{+}) < f(x^{-}) \end{matrix}\right.$

loss function

形式化地看，AUC考慮的是樣本預測的排序質量，因此它與排序誤差有緊密聯繫。給定 $m_{+}$ 個正例和 $m_{-}$ 個反例，令 $D_{+}$ ， $D_{-}$ 分別表示正反例集合，則排序“損失（loss）”可定義爲：
$l_{rank} = \frac{1}{m_{+}m_{-}} \sum_{x^{+} \in D^{+}} \sum_{x^{-} \in D^{-}} (I(f(x^{+}) < f(x^{-}) + \frac{1}{2}I(f(x^{+}) = f(x^{-})))$
即考慮每一對正反例，或正例預測值小於返利，則記一個“罰分”，若相等，則記0.5個罰分。 $l_{rank}$ 對應的是ROC曲線之上的面積；若一個正例在ROC曲線上對應的標記爲（x, y), 則x恰是排序在其之前的反例所佔的比例，因此有
$AUC = 1 - l_{rank}$

模型評估

classification

Confusion matrix

Accuracy

Precision

Recall

P-R曲線

F1 score

Harmonic mean（調和平均值）

F1 score

$F\beta$ Score

macro-F1 and micro-F1

ROC curve

特殊點的詮釋

曲線的特殊位置

ROC曲線的繪製

AUC的計算

AUC的統計學解釋

loss function

HTML頁面關於高分屏的設置

北歐瑞典挪威芬蘭瑞士TikTok海外網紅與YouTube博主的合作模式

歐洲英國德國法國TikTok與YouTube海外網紅達人的完美合作策略

druid數據源 xml配置

9801

有時會禁用拷貝構造和賦值操作的原因

Stable Matching Problem

leetcode筆試題知識點總結，模板總結

模型評估

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

模型評估

classification

Confusion matrix

Accuracy

Precision

Recall

P-R曲線

F1 score

Harmonic mean（調和平均值）

F1 score

FβF\betaFβ Score

macro-F1 and micro-F1

ROC curve

特殊點的詮釋

曲線的特殊位置

ROC曲線的繪製

AUC的計算

AUC的統計學解釋

loss function

$F\beta$ Score