cross_val_score交叉驗證及其用於參數選擇、模型選擇、特徵選擇

原創

starky0729

2018-11-26 06:38

K折交叉驗證：sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)

思路：將訓練/測試數據集劃分n_splits個互斥子集，每次用其中一個子集當作驗證集，剩下的n_splits-1個作爲訓練集，進行n_splits次訓練和測試，得到n_splits個結果

注意點：對於不能均等份的數據集，其前n_samples % n_splits子集擁有n_samples // n_splits + 1個樣本，其餘子集都只有n_samples // n_splits樣本

參數說明：

n_splits：表示劃分幾等份

shuffle：在每次劃分時，是否進行洗牌

①若爲Falses時，其效果等同於random_state等於整數，每次劃分的結果相同

②若爲True時，每次劃分的結果都不一樣，表示經過洗牌，隨機取樣的

random_state：隨機種子數

屬性：

①get_n_splits(X=None, y=None, groups=None)：獲取參數n_splits的值

②split(X, y=None, groups=None)：將數據集劃分成訓練集和測試集，返回索引生成器

通過一個不能均等劃分的栗子，設置不同參數值，觀察其結果

①設置shuffle=False，運行兩次，發現兩次結果相同


In [1]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=False)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 3  4  5  6  7  8  9 10 11] , test_index: [0 1 2]
train_index:[ 0  1  2  6  7  8  9 10 11] , test_index: [3 4 5]
train_index:[ 0  1  2  3  4  5  8  9 10 11] , test_index: [6 7]
train_index:[ 0  1  2  3  4  5  6  7 10 11] , test_index: [8 9]
train_index:[0 1 2 3 4 5 6 7 8 9] , test_index: [10 11]
 
In [2]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=False)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 3  4  5  6  7  8  9 10 11] , test_index: [0 1 2]
train_index:[ 0  1  2  6  7  8  9 10 11] , test_index: [3 4 5]
train_index:[ 0  1  2  3  4  5  8  9 10 11] , test_index: [6 7]
train_index:[ 0  1  2  3  4  5  6  7 10 11] , test_index: [8 9]
train_index:[0 1 2 3 4 5 6 7 8 9] , test_index: [10 11]

②設置shuffle=True時，運行兩次，發現兩次運行的結果不同


In [3]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=True)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 0  1  2  4  5  6  7  8 10] , test_index: [ 3  9 11]
train_index:[ 0  1  2  3  4  5  9 10 11] , test_index: [6 7 8]
train_index:[ 2  3  4  5  6  7  8  9 10 11] , test_index: [0 1]
train_index:[ 0  1  3  4  5  6  7  8  9 11] , test_index: [ 2 10]
train_index:[ 0  1  2  3  6  7  8  9 10 11] , test_index: [4 5]
 
In [4]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=True)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 0  1  2  3  4  5  7  8 11] , test_index: [ 6  9 10]
train_index:[ 2  3  4  5  6  8  9 10 11] , test_index: [0 1 7]
train_index:[ 0  1  3  5  6  7  8  9 10 11] , test_index: [2 4]
train_index:[ 0  1  2  3  4  6  7  9 10 11] , test_index: [5 8]
train_index:[ 0  1  2  4  5  6  7  8  9 10] , test_index: [ 3 11]

③設置shuffle=True和random_state=整數，發現每次運行的結果都相同


In [5]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=True,random_state=0)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 0  1  2  3  5  7  8  9 10] , test_index: [ 4  6 11]
train_index:[ 0  1  3  4  5  6  7  9 11] , test_index: [ 2  8 10]
train_index:[ 0  2  3  4  5  6  8  9 10 11] , test_index: [1 7]
train_index:[ 0  1  2  4  5  6  7  8 10 11] , test_index: [3 9]
train_index:[ 1  2  3  4  6  7  8  9 10 11] , test_index: [0 5]
 
In [6]: from sklearn.model_selection import KFold
   ...: import numpy as np
   ...: X = np.arange(24).reshape(12,2)
   ...: y = np.random.choice([1,2],12,p=[0.4,0.6])
   ...: kf = KFold(n_splits=5,shuffle=True,random_state=0)
   ...: for train_index , test_index in kf.split(X):
   ...:     print('train_index:%s , test_index: %s ' %(train_index,test_index))
   ...:
   ...:
train_index:[ 0  1  2  3  5  7  8  9 10] , test_index: [ 4  6 11]
train_index:[ 0  1  3  4  5  6  7  9 11] , test_index: [ 2  8 10]
train_index:[ 0  2  3  4  5  6  8  9 10 11] , test_index: [1 7]
train_index:[ 0  1  2  4  5  6  7  8 10 11] , test_index: [3 9]
train_index:[ 1  2  3  4  6  7  8  9 10 11] , test_index: [0 5]

④n_splits屬性值獲取方式


In [8]: kf.split(X)
Out[8]: <generator object _BaseKFold.split at 0x00000000047FF990>
 
In [9]: kf.get_n_splits()
Out[9]: 5
 
In [10]: kf.n_splits
Out[10]: 5

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

cross_val_score交叉驗證及其用於參數選擇、模型選擇、特徵選擇

OSI 七層協議

全連接，局部感知，權值共享，卷積輸入輸出的個人理解

卷積層輸出大小尺寸計算及padding爲 “SAME” 和 “VALID”的計算

歸一化（Normalization）、標準化（Standardization）和中心化/零均值化（Zero-centered）,BN,Batch,批歸一化,從歸一化到批歸一化

Python 中常見數據集打亂方法

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結