sklearn.model_selection.train_test_split方法初識

原創

2019-09-17 06:10

sklearn.model_selection.train_test_split

將數組或矩陣切分成隨機訓練和測試子集。

參數列表：

1 *arrays : sequence of indexables with same length / shape[0]

行數(樣本數)/shape[0]取值相同的
2 test_size : float,int or None,optional(default=None)

如果爲空，那麼該值是設置爲0.25
3 train_size : float, int, or None, (default=None)

如果爲空，那麼該值爲1-test_size
4 random_state : int, RandomState instance or None, optional (default=None)

該參數可以取值爲int值，如果是int值，那麼此時int值作爲隨機數生成種子；如果是RandomState實例，那麼這個就是隨機數生成器，用於生成隨機數。如果爲None，那麼默認使用np.random的Random State 實例
5 shuffle : boolean, optional (default=True)

在進行數據拆分前是否suffle。默認是進行suffle，如果設置了False，那麼stratify一定爲None。
6 stratify : array-like or None (default=None)

如果不是None，則以分層方法時分割數據，並將其作爲類標籤。

返回值:
splitting : list, length=2 * len(arrays)

List containing train-test split of inputs

import numpy as np 
from sklearn.model_selection import train_test_split
X,y = np.arange(10).reshape(5,2),range(5)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

list(y)

[0, 1, 2, 3, 4]

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.33,random_state=42)

X_train

array([[4, 5],
       [0, 1],
       [6, 7]])

y_train

[2, 0, 3]

X_test

array([[2, 3],
       [8, 9]])

y_test

[1, 4]

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.