特徵選擇---SelectKBest

原創

2019-07-30 19:06

看論文偶然看到這個方法，就瞭解一下。

from sklearn.feature_selection import SelectKBest

http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest.set_params

class SelectKBest(_BaseFilter):
    """Select features according to the k highest scores.

    Read more in the :ref:`User Guide <univariate_feature_selection>`.

    Parameters
    ----------
    score_func : callable
        Function taking two arrays X and y, and returning a pair of arrays
        (scores, pvalues) or a single array with scores.
        Default is f_classif (see below "See also"). The default function only
        works with classification tasks.

    k : int or "all", optional, default=10
        Number of top features to select.
        The "all" option bypasses selection, for use in a parameter search.

    Attributes
    ----------
    scores_ : array-like, shape=(n_features,)
        Scores of features.

    pvalues_ : array-like, shape=(n_features,)
        p-values of feature scores, None if `score_func` returned only scores.

    Notes
    -----
    Ties between features with equal scores will be broken in an unspecified
    way.

    See also
    --------
    f_classif: ANOVA F-value between label/feature for classification tasks.
    mutual_info_classif: Mutual information for a discrete target.
    chi2: Chi-squared stats of non-negative features for classification tasks.
    f_regression: F-value between label/feature for regression tasks.
    mutual_info_regression: Mutual information for a continuous target.
    SelectPercentile: Select features based on percentile of the highest scores.
    SelectFpr: Select features based on a false positive rate test.
    SelectFdr: Select features based on an estimated false discovery rate.
    SelectFwe: Select features based on family-wise error rate.
    GenericUnivariateSelect: Univariate feature selector with configurable mode.
    """

官網的一個例子（需要自己給出計算公式、和k值）

參數

1、score_func : callable，函數取兩個數組X和y，返回一對數組（scores, pvalues）或一個分數的數組。默認函數爲f_classif，默認函數只適用於分類函數。
2、k：int or "all", optional, default=10。所選擇的topK個特徵。“all”選項則繞過選擇，用於參數搜索。

屬性

1、scores_ : array-like, shape=(n_features,)，特徵的得分
2、pvalues_ : array-like, shape=(n_features,)，特徵得分的p_value值，如果score_func只返回分數，則返回None。

score_func裏可選的公式

方法

1、fit(X,y)，在（X，y）上運行記分函數並得到適當的特徵。
2、fit_transform(X[, y])，擬合數據，然後轉換數據。
3、get_params([deep])，獲得此估計器的參數。
4、get_support([indices])，獲取所選特徵的掩碼或整數索引。
5、inverse_transform(X)，反向變換操作。
6、set_params(**params)，設置估計器的參數。
7、transform(X)，將X還原爲所選特徵。

如何返回選擇特徵的名稱或者索引。其實在上面的方法中已經提了一下了，那就是get_support（）

之前的digit數據是不帶特徵名稱的，我選擇了帶特徵的波士頓房價數據，因爲是迴歸數據，所以計算的評價指標也跟着變換了，f_regression，這裏需要先fit一下，才能使用get_support()。裏面的參數如果索引選擇True，

返回值就是feature的索引，可能想直接返回feature name在這裏不能這麼直接的調用了，但是在dataset裏面去對應一下應該很容易的。這裏我給出的K是5，選擇得分最高的前5個特徵，分別是第2,5,9,10,12個屬性。
如果裏面的參數選擇了False，返回值就是該特徵是否被選擇的Boolean值。

鏈接：https://www.jianshu.com/p/586ba8c96a3d

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

特徵選擇---SelectKBest

參數

屬性

score_func裏可選的公式

方法

TDengine docker安裝方法

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

Navicat安裝與激活教程

Github使用，git

用電腦控制手機

多標籤評價指標--Macro-F1、Micro-F1

基因、DNA、鹼基、染色體之間的關係是什麼？

生信投文章經驗積累總結

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結