Logist Regression代碼詳解以及Demo

今天大管和大家來聊一聊邏輯迴歸在sklearn中的具體使用，以及詳細的解析。在文章末尾，我們使用官網提供的案例來使用邏輯迴歸對鳶尾花數據集進行分類。

Logist Regression

邏輯迴歸，儘管它的名字，是一個線性模型的分類，而不是迴歸。Logistic迴歸在文獻中也稱爲logit迴歸、最大熵分類(MaxEnt)或對數線性分類器。在這個模型中，描述單個試驗可能結果的概率使用邏輯函數來建模。

#調用函數

class sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='warn', max_iter=100, multi_class='warn', verbose=0, warm_start=False, n_jobs=None)

在多類情況下，如果將“multi_class”選項設置爲“OvR”，則訓練算法使用one-vs-rest (OvR)模式;如果將“multi_class”選項設置爲“ovr”，則使用交叉熵損失。

#參數Parameters

##penalty: str, ‘l1’ or ‘l2’, default: ‘l2’ 用於指定正則項使用L1還是L2，默認使用L2。

##dual: bool, default: False 雙重或原始方程。對偶公式僅適用於l2正則。當樣本數量大於特徵數量時，最好使用dual=False。

##tol: float, default: 1e-4 停止條件設置，默認爲1e-4。

##C: float, default: 1.0 正則化強度，較小的值表示更強的正則化。

##fit_intercept:bool, default: True 指定是邏輯迴歸函數是夠需要截距。

##intercept_scaling: float, default 1 只有當解算器“liblinear”被使用並自定義時纔有用。fit_intercept設置爲True。在這種情況下，x變成[x, self.intercept_scaling]，也就是說，在實例向量的後面加上一個值等於intercept_scaling的“合成”特性。截距變成了intercept_scaling * synthetic_feature_weight。

##class_weight: dict or ‘balanced’, default: None 字典的額形式{class_label: weight}給出類相關聯的權重。如果沒有給出，所有的類都應該有權重1。

##random_state: int, RandomState instance or None, optional, default: None 數據變換時使用的僞隨機數生成器的種子。如果int, random_state是隨機數生成器使用的種子; 如果RandomState實例，random_state是隨機數生成器;如果沒有，隨機數生成器就是np.random使用的RandomState實例。當solver == ' sag '或' liblinear '時使用。

##solver: str, {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default: ‘liblinear’. 用於優化問題的算法。對於小的數據集，“liblinear”是一個不錯的選擇，而對於大的數據集，“sag”和“saga”更快。對於多類問題，只有“newton-cg”、“sag”、“saga”和“lbfgs”處理多項損失; “liblinear”僅限於一個對one-versus-rest的方案。“newton-cg”、“lbfgs”和“sag”只處理L2處罰，而“liblinear”和“saga”處理L1處罰。

##max_iter: int, default: 100 僅適用於newton-cg, sag和lbfgs的求解。求解器收斂所需的最大迭代次數。

##multi_class: str, {‘ovr’, ‘multinomial’, ‘auto’}, default: ‘ovr’ 如果選擇的選項是' ovr '，那麼每個標籤都適合一個二進制問題。對於多項式，即使數據是二進制的，損失最小化是多項式損失符合整個概率分佈，當解算器= ' liblinear '時，'多項'不可用。“auto”選擇“ovr”，如果數據是二進制的，或者如果solver=“liblinear”，否則選擇“multinomial”。

##verbose: int, default: 0 對於liblinear和lbfgs求解器，將冗餘設置爲任意正數。

##warm_start: bool, default: False 當設置爲True時，重用前一個調用的解決方案以適應初始化，否則，清除前一個解決方案。

##n_jobs: int or None, optional (default=None) 如果multi_class= ' ovr ' "，則在類之間並行化時使用的CPU核數"。當求解器被設置爲“liblinear”時，不管是否指定了“multi_class”，這個參數都會被忽略。

#屬性Attributes

##classes_: array, shape (n_classes, ) 已知的分類器標籤列表

##coef_: array, shape (1, n_features) or (n_classes, n_features) 特徵的係數

##intercept_: array, shape (1,) or (n_classes,) 分類器的獨立項(偏置)

##n_iter_: array, shape (n_classes,) or (1, ) 所有類的實際迭代次數。如果是二進制或多項式，它只返回一個元素。對於線性解算器，只給出了所有類的最大迭代次數。

#代碼舉例

>>> from sklearn.datasets import load_iris
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = load_iris(return_X_y=True)
>>> clf = LogisticRegression(random_state=0, solver='lbfgs',multi_class='multinomial').fit(X, y)
>>> clf.predict(X[:2, :])
array([0, 0])
>>> clf.predict_proba(X[:2, :]) 
array([[9.8...e-01, 1.8...e-02, 1.4...e-08],
       [9.7...e-01, 2.8...e-02, ...e-08]])
>>> clf.score(X, y)
0.97...

#方法Methods

## decision_function(X) 預測樣本的置信得分

參數Parameters

X: array_like or sparse matrix, shape (n_samples, n_features) 輸入樣本

返回值Return

每個(樣本，類別)的信心得分。在二元情況下，爲自身置信度得分，大於0表示該類被預測

##densify() 轉換系數矩陣爲密集array的格式。

##fit(X, y, sample_weight=None) 根據訓練數據來擬合模型

參數Parameters

X: {array-like, sparse matrix}, shape (n_samples, n_features) 訓練數據，其中n_samples爲樣本個數，n_features爲特徵個數。

y: array-like, shape (n_samples,) 對於訓練數據的標籤

sample_weight: array-like, shape (n_samples,) optional 分配給單個樣本的權重數組。如果沒有提供，那麼每個樣本的權重都爲1。

##get_params(deep=True) 從模型中獲取參數

參數Parameters

deep: boolean, optional 如果爲真，將返回此估計器的參數以及包含的作爲估計器的子對象。

返回值Return

params: mapping of string to any 返回模型參數名所映射的值

##predict(X) 線性模型的預測值

參數Parameters

X: array_like or sparse matrix, shape (n_samples, n_features) 要預測的樣本

返回值Return

C: array, shape (n_samples,) 返回預測的值

##score(X, y, sample_weight=None) 返回給定測試數據和標籤的平均準確度

參數Parameters

X: array_like or sparse matrix, shape (n_samples, n_features) 要預測的樣本

y: array-like, shape = (n_samples) or (n_samples, n_outputs) X的真實標籤值

sample_weight: array-like, shape = [n_samples], optional 樣本的權重默認不設置

返回值Return

使用測試用本的平均準確度

##set_params(**params) 給模型設置參數。

##predict_log_proba(X) 概率估計的對數

參數Parameters

X: array-like, shape = [n_samples, n_features] 要預測的樣本

返回值Return

T: array-like, shape = [n_samples, n_classes] 返回模型中每個類的樣本的對數概率(按照類別中的順序排列)。

##predict_proba(X) 概率估計，對於多類問題，如果將多類設置爲“多項”，則使用softmax函數來查找每個類的預測概率。否則使用one-vs-rest方法。

例如使用logistic函數計算每個類假設爲正的概率。並在所有類中規範化這些值。

參數Parameters

X: array-like, shape = [n_samples, n_features] 要預測的樣本

返回值Return

T: array-like, shape = [n_samples, n_classes] 返回模型中每個類的樣本概率(按照類別中的順序排列)。

#實例

使用Logistics迴歸對鳶尾花數據進行三分類。


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
### import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
Y = iris.target
logreg = LogisticRegression(C=1e5, solver='lbfgs', multi_class='multinomial')
### Create an instance of Logistic Regression Classifier and fit the data.
logreg.fit(X, Y)
### Plot the decision boundary. For that, we will assign a color to each
### point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02  # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])
### Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)
### Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.show()

下面顯示的是鳶尾花數據集的前兩個維度(萼片長度和寬度)上的邏輯迴歸分類器決策邊界。數據點根據其標籤着色。

多內容下載機器學習資料請掃描下方二維碼關注小編公衆號:程序員大管

Logist Regression代碼詳解以及Demo

Logist Regression

#調用函數

#參數Parameters

#屬性Attributes

#代碼舉例

#方法Methods

#實例

druid數據源 xml配置

使用tensorflow進行手寫數字識別

java中線程和同步鎖的操作

tensorflow簡單的Demo

如何進行特徵歸一化

Python編程KMP匹配算法及實現

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結