特徵選擇 - SelectFromModel（根據重要性權重選擇特徵）

文章目錄

一，函數

class sklearn.feature_selection.SelectFromModel(estimator, 
		*, threshold=None, prefit=False, norm_order=1, 
		max_features=None)[source]

元變壓器，用於根據重要性權重選擇特徵。

二，參數說明

Parameters
----------
estimator: object
	       用來構建變壓器的基本估算器。
	       既可以是擬合的（如果prefit設置爲True），也可以是不擬合的估計量。
	       擬合後，估算器必須具有 feature_importances_或coef_屬性。

threshold: str, float, optional default None
		   用於特徵選擇的閾值。
		   保留重要性更高或相等的要素，而其他要素則被丟棄。
		   如果爲“中位數”（分別爲“平均值”），
		   	 則該threshold值爲要素重要性的中位數（分別爲平均值）。
		  	 也可以使用縮放因子（例如，“ 1.25 *平均值”）。
		   如果爲None且估計器的參數懲罰顯式或隱式設置爲l1（例如Lasso），
		   	 則使用的閾值爲1e-5。
		   否則，默認使用“均值”。

prefit: bool, default False
		預設模型是否期望直接傳遞給構造函數。
		如果爲True，transform必須直接調用和
		SelectFromModel不能使用cross_val_score， GridSearchCV而且克隆估計類似的實用程序。
		否則，使用訓練模型fit，然後transform進行特徵選擇。

norm_order: 非零 int, inf, -inf, default 1
			在估算器threshold的coef_屬性爲維度2 的情況下，
			用於過濾以下係數矢量的範數的順序 。

max_features：int or None, optional
			  要選擇的最大功能數。
			  若要僅基於選擇max_features，請設置threshold=-np.inf。

Attributes
----------
estimator_：一個估算器
		    用來建立變壓器的基本估計器。
			只有當一個不適合的估計器傳遞給SelectFromModel時，
			纔會存儲這個值，即當prefit爲False時。

threshold_：float
			用於特徵選擇的閾值。

筆記

如果基礎估計量也可以輸入，則允許NaN / Inf。

三，方法

'fit(self, X[, y])'
	訓練SelectFromModel元變壓器。

'fit_transform(self, X[, y])'
	訓練元變壓器，然後對X進行轉換。

'get_params(self[, deep])'
	獲取此估計量的參數。
	
'get_support(self[, indices])'
	獲取所選特徵的掩碼或整數索引

'inverse_transform(self, X)'
	反向轉換操作
	
'partial_fit(self, X[, y])'
	僅將SelectFromModel元變壓器訓練一次。

'set_params(self, \*\*params)'
	設置此估算器的參數。

'transform(self, X)'
	將X縮小爲選定的特徵。

四，示例

>>> X = [[ 0.87, -1.34,  0.31 ],
...      [-2.79, -0.02, -0.85 ],
...      [-1.34, -0.48, -2.55 ],
...      [ 1.92,  1.48,  0.65 ]]
>>> y = [0, 1, 0, 1]

>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression

>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)

>>> selector.estimator_.coef_
array([[-0.3252302 ,  0.83462377,  0.49750423]])

>>> selector.threshold_
0.55245...

>>> selector.get_support()
array([False,  True, False])

選擇了X的第二個特徵

>>> selector.transform(X)
array([[-1.34],
       [-0.02],
       [-0.48],
       [ 1.48]])

五，使用SelectFromModel和LassoCV選擇特徵

使用SelectFromModel meta-transformer和Lasso可以從糖尿病數據集中選擇最佳的幾個特徵。

由於L1規範促進了特徵的稀疏性，我們可能只對從數據集中選擇最有趣特徵的子集感興趣。此示例說明如何從糖尿病數據集中選擇兩個最有趣的功能。

糖尿病數據集由從442名糖尿病患者中收集的10個變量（特徵）組成。此示例顯示瞭如何使用SelectFromModel和LassoCv查找預測從基線開始一年後疾病進展的最佳兩個功能。

import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import load_diabetes
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV

加載數據
首先，讓我們加載sklearn中可用的糖尿病數據集。然後，我們將看看爲糖尿病患者收集了哪些功能：

diabetes = load_diabetes()

X = diabetes.data
y = diabetes.target

feature_names = diabetes.feature_names
print(feature_names)
>>>['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

已知，有10個特徵

X.shape
>>>(442, 10)

查找特徵的importance
爲了確定功能的重要性，我們將使用LassoCV估計器。具有最高絕對值的特徵coef_被認爲是最重要的

clf = LassoCV().fit(X, y)
importance = np.abs(clf.coef_)
print(importance)
>>>[  6.49684455 235.99640534 521.73854261 321.06689245 569.4426838
	302.45627915   0.         143.6995665  669.92633112  66.83430445]

從具有最高分數的模型特徵中進行選擇
現在，我們要選擇最重要的兩個功能。
SelectFromModel（）允許設置閾值。僅coef_保留高於閾值的要素。在這裏，我們希望將閾值設置爲略高於coef_LassoCV（）根據數據計算出的第三高閾值。

idx_third = importance.argsort()[-3]
threshold = importance[idx_third] + 0.01

idx_features = (-importance).argsort()[:2]
name_features = np.array(feature_names)[idx_features]
print('Selected features: {}'.format(name_features))
>>>Selected features: ['s5' 's1']

進行特徵選擇
從中提取出了‘s5’和‘s1’兩個特徵

sfm = SelectFromModel(clf, threshold=threshold)
sfm.fit(X, y)
X_transform = sfm.transform(X)

X_transform.shape
>>>(442, 2)

畫出兩個最重要的特徵
最後，我們將繪製從數據中選擇的兩個特徵。

plt.title(
    "Features from diabets using SelectFromModel with "
    "threshold %0.3f." % sfm.threshold)
feature1 = X_transform[:, 0]
feature2 = X_transform[:, 1]
plt.plot(feature1, feature2, 'r.')
plt.xlabel("First feature: {}".format(name_features[0]))
plt.ylabel("Second feature: {}".format(name_features[1]))
plt.ylim([np.min(feature2), np.max(feature2)])
plt.show()

特徵選擇 - SelectFromModel（根據重要性權重選擇特徵）

文章目錄

一，函數

二，參數說明

三，方法

四，示例

五，使用SelectFromModel和LassoCV選擇特徵

【SQL進階】CASE語句的使用

npm error Cannot read properties of null (reading 'isDescendantOf')

Iris數據集的LDA和PCA二維投影的比較

分解組件中的信號（矩陣分解問題） - 數據降維

特徵選擇過濾器 - f_regression（單變量線性迴歸測試）

特徵選擇過濾器 -chi2（卡方統計量）

scipy.stats.pearsonr - 皮爾森相關係數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結