sklearn 抽樣

1 交叉驗證包中的train_test_split

設置測試集比例,在原始數據中隨機採樣,但是所得樣本中各個類別比例保持與原樣本一致。如下例所示

import numpy as np
data_x=[['this is class 1 ']]*100 + [['this is class 2']]*50
data_y=[[1]]*100 + [[2]]*50
data_y

X_train, X_test, y_train, y_test = train_test_split(data_x,data_y,test_size = 0.3)
X_train
#sum(y_train==1)/len(y_train)

print('numbers of positive class in training data:', sum( np.mat(y_train)==1 )[0],'/',len(y_train) )
print('numbers of negative class in training data:', sum( np.mat(y_train)==2 )[0],'/', len(y_train))
print('numbers of positive class in test data:',sum(np.mat(y_test)== 1 ),'/',len(y_test))

numbers of positive class in training set: [[74]] / 105
numbers of negative class in training set: [[31]] / 105
numbers of positive class in test data: [[26]] / 45

待更新

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章