損失函數(loss function)或代價函數(cost function)的事情

In the theory of point estimation, a loss function quantifies the losses associated to the errors committed while estimating a parameter. Often the expected value of the loss, called statistical risk, is used to compare two or more estimators: in such comparisons, the estimator having the least expected loss is usually deemed preferable.

在這裏插入圖片描述

最小二乘法是線性迴歸的一種,OLS將問題轉化成了一個凸優化問題。在線性迴歸中,它假設樣本和噪聲都服從高斯分佈(爲什麼假設成高斯分佈呢?其實這裏隱藏了一個小知識點,就是中心極限定理,可以參考【central limit theorem】),最後通過極大似然估計(MLE)可以推導出最小二乘式子。最小二乘的基本原則是:最優擬合直線應該是使各點到迴歸直線的距離和最小的直線,即平方和最小。換言之,OLS是基於距離的,而這個距離就是我們用的最多的歐幾里得距離。爲什麼它會選擇使用歐式距離作爲誤差度量呢(即Mean squared error, MSE),主要有以下幾個原因:

簡單,計算方便;
歐氏距離是一種很好的相似性度量標準;
在不同的表示域變換後特徵性質不變。
平方損失(Square loss)的標準形式如下:
?(?,?(?))=(?−?(?))2

當樣本個數爲n時,此時的損失函數變爲:
L(Y,f(X))=i=1n(Yf(X))2L(Y, f(X)) = \sum _{i=1}^{n}(Y - f(X))^2
Y-f(X)表示的是殘差,整個式子表示的是殘差的平方和,而我們的目的就是最小化這個目標函數值(注:該式子未加入正則項),也就是最小化殘差的平方和(residual sum of squares,RSS)。

而在實際應用中,通常會使用均方差(MSE)作爲一項衡量指標,公式如下:
???=1?∑?=1?(??~−??)2

上面提到了線性迴歸,這裏額外補充一句,我們通常說的線性有兩種情況,一種是因變量y是自變量x的線性函數,一種是因變量y是參數?的線性函數。在機器學習中,通常指的都是後一種情況。
在這裏插入圖片描述

在這裏插入圖片描述
Notice how in the loss function we defined, it doesn’t matter if our predictions were too high or too low. All that matters is how incorrect we were, directionally agnostic. This is not a feature of all loss functions: in fact, your loss function will vary significantly based on the domain and unique context of the problem that you’re applying machine learning to. In your project, it may be much worse to guess too high than to guess too low, and the loss function you select must reflect that.

在這裏插入圖片描述

程序實現

#!/usr/bin/env python
# -*- coding: utf8 -*-
# y_true: list, the true labels of input instances 
# y_pred: list, the probability when the predicted label of input instances equals to 1
def logloss(y_true, y_pred, eps=1e-15):
    import numpy as np

    # Prepare numpy array data
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    assert (len(y_true) and len(y_true) == len(y_pred))

    # Clip y_pred between eps and 1-eps
    p = np.clip(y_pred, eps, 1-eps)
    loss = np.sum(- y_true * np.log(p) - (1 - y_true) * np.log(1-p))

    return loss / len(y_true)


def unitest():
    y_true = [0, 0, 1, 1]
    y_pred = [0.1, 0.2, 0.7, 0.99]

    print ("Use self-defined logloss() in binary classification, the result is {}".format(logloss(y_true, y_pred)))

    from sklearn.metrics import log_loss
    print ("Use log_loss() in scikit-learn, the result is {} ".format(log_loss(y_true, y_pred)))

if __name__ == '__main__':
    unitest()
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章