邏輯迴歸形如其名,是一個線性分類模型而不是迴歸模型。邏輯迴歸在文獻中也稱爲logit迴歸、最大熵分類(MaxEnt) 或者 log-linear classifier。 在這個模型中,描述單次可能結果輸出概率使用 logistic function 來建模。
scikit-learn中邏輯迴歸的實現爲 LogisticRegression
類。它可以擬合含L2或者L1正則化項的多類邏輯迴歸問題。
作爲一個優化問題,二分類L2 通過下方的代價函數來懲罰邏輯迴歸:
類似的,L1 正則化邏輯迴歸解決下述的優化問題:
LogisticRegression
中的實現是solver
“liblinear” (一個擴展的C++ library,LIBLINEAR), “newton-cg”, “lbfgs” and “sag”。
“lbfgs” 和 “newton-cg” 只支持L2罰項,並且對於一些高維數據收斂非常快。L1罰項產生稀疏預測的權重。
“liblinear” 使用了基於Liblinear的座標下降法(CD)。對於F1罰項, sklearn.svm.l1_min_c
允許計算C的下界以獲得一個非”null”
的 模型(所有特徵權重爲0)。這依賴於非常棒的一個庫 LIBLINEAR library ,用在scikit-learn中。 然而,CD算法在liblinear中的實現無法學習一個真正的多維(多類)的模型。反而,最優問題被分解爲
“one-vs-rest” 多個二分類問題來解決多分類。 由於底層是這樣實現的,所以使用了該庫的 LogisticRegression
類就可以作爲多類分類器了。
LogisticRegression
使用
“lbfgs” 或者 “newton-cg” 程序 來設置 multi_class 爲 “multinomial”,則該類學習 了一個真正的多類邏輯迴歸模型,也就是說這種概率估計應該比默認 “one-vs-rest” 設置要更加準確。但是 “lbfgs”, “newton-cg” 和 “sag” 程序無法優化 含L1罰項的模型,所以”multinomial” 的設置無法學習稀疏模型。
“sag” 程序使用了隨機平均梯度下降( Stochastic Average Gradient descent [3])。它無法解決多分類問題,而且對於含L2罰項的模型有侷限性。 然而在超大數據集下計算要比其他程序快很多,當樣本數量和特徵數量都非常大的時候。
簡單概括下,可以按照以下規則來選擇solver:
Case | Solver |
---|---|
Small dataset or L1 penalty | “liblinear” |
Multinomial loss | “lbfgs” or newton-cg” |
Large dataset | “sag” |
對於超大數據集,你同樣可以考慮使用帶log損失的 SGDClassifier
Differences from liblinear:
There might be a difference in the scores obtained between LogisticRegression
with solver=liblinear
or LinearSVC
and
the external liblinear library directly, when fit_intercept=False
and
the fit coef_
(or) the data to be predicted are zeroes.
This is because for the sample(s) with decision_function
zero, LogisticRegression
and LinearSVC
predict
the negative class, while liblinear predicts the positive class. Note that a model with fit_intercept=False
and
having many samples with decision_function
zero, is likely
to be a underfit, bad model and you are advised to set fit_intercept=True
and
increase the intercept_scaling.
Note
Feature selection with sparse logistic regression
A logistic regression with L1 penalty yields sparse models, and can thus be used to perform feature selection, as detailed in 基於L1的特徵選擇(L1-based feature selection).
LogisticRegressionCV
實現了一個內建的交叉驗證來尋找最優的參數C的邏輯迴歸模型。”newton-cg”,”sag”
和 ”lbfgs” 程序在高維稠密數據上計算更快,原因在於warm-starting.對於多類問題,如果 multi_class 選項設置爲 “ovr” ,那麼最優的C從每個類別中獲得,如果 multi_class 選項設置爲 ”multinomial” ,那麼最優的C通過最小化交叉熵損失得到。
Examples:
我們把數字8x8圖像可分爲兩類:0-4對5-9。可視化顯示了不同C模型的係數。
-
Script output:
C=100.00 Sparsity with L1 penalty: 6.25% score with L1 penalty: 0.9104 Sparsity with L2 penalty: 4.69% score with L2 penalty: 0.9098 C=1.00 Sparsity with L1 penalty: 10.94% score with L1 penalty: 0.9098 Sparsity with L2 penalty: 4.69% score with L2 penalty: 0.9093 C=0.01 Sparsity with L1 penalty: 85.94% score with L1 penalty: 0.8614 Sparsity with L2 penalty: 4.69% score with L2 penalty: 0.8915
Python source code:
plot_logistic_l1_l2_sparsity.py
print(__doc__) # Authors: Alexandre Gramfort <[email protected]> # Mathieu Blondel <[email protected]> # Andreas Mueller <[email protected]> # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.preprocessing import StandardScaler digits = datasets.load_digits() X, y = digits.data, digits.target X = StandardScaler().fit_transform(X) # classify small against large digits y = (y > 4).astype(np.int) # Set regularization parameter for i, C in enumerate((100, 1, 0.01)): # turn down tolerance for short training time clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01) clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01) clf_l1_LR.fit(X, y) clf_l2_LR.fit(X, y) coef_l1_LR = clf_l1_LR.coef_.ravel() coef_l2_LR = clf_l2_LR.coef_.ravel() # coef_l1_LR contains zeros due to the # L1 sparsity inducing norm sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100 sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100 print("C=%.2f" % C) print("Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR) print("score with L1 penalty: %.4f" % clf_l1_LR.score(X, y)) print("Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR) print("score with L2 penalty: %.4f" % clf_l2_LR.score(X, y)) l1_plot = plt.subplot(3, 2, 2 * i + 1) l2_plot = plt.subplot(3, 2, 2 * (i + 1)) if i == 0: l1_plot.set_title("L1 penalty") l2_plot.set_title("L2 penalty") l1_plot.imshow(np.abs(coef_l1_LR.reshape(8, 8)), interpolation='nearest', cmap='binary', vmax=1, vmin=0) l2_plot.imshow(np.abs(coef_l2_LR.reshape(8, 8)), interpolation='nearest', cmap='binary', vmax=1, vmin=0) plt.text(-8, 3, "C = %.2f" % C) l1_plot.set_xticks(()) l1_plot.set_yticks(()) l2_plot.set_xticks(()) l2_plot.set_yticks(()) plt.show()
L1 logistic迴歸路徑 -
Computes path on IRIS dataset.
Script output:
Computing regularization path ... This took 0:00:00.147946
Python source code:
plot_logistic_path.py
print(__doc__) # Author: Alexandre Gramfort <[email protected]> # License: BSD 3 clause from datetime import datetime import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model from sklearn import datasets from sklearn.svm import l1_min_c iris = datasets.load_iris() X = iris.data y = iris.target X = X[y != 2] y = y[y != 2] X -= np.mean(X, 0) ############################################################################### # Demo path functions cs = l1_min_c(X, y, loss='log') * np.logspace(0, 3) print("Computing regularization path ...") start = datetime.now() clf = linear_model.LogisticRegression(C=1.0, penalty='l1', tol=1e-6) coefs_ = [] for c in cs: clf.set_params(C=c) clf.fit(X, y) coefs_.append(clf.coef_.ravel().copy()) print("This took ", datetime.now() - start) coefs_ = np.array(coefs_) plt.plot(np.log10(cs), coefs_) ymin, ymax = plt.ylim() plt.xlabel('log(C)') plt.ylabel('Coefficients') plt.title('Logistic Regression Path') plt.axis('tight') plt.show()