項目介紹
這次我們要學習的是銀行用戶流失預測項目,首先先來看看數據,數據分別存放在兩個文件中,’Churn-Modelling.csv’裏面是訓練數據,’Churn-Modelling-Test-Data.csv’裏面是測試數據。下面是數據內容:
數據來源於國外匿名化處理後的真實數據
RowNumber:行號
CustomerID:用戶編號
Surname:用戶姓名
CreditScore:信用分數
Geography:用戶所在國家/地區
Gender:用戶性別
Age:年齡
Tenure:當了本銀行多少年用戶
Balance:存貸款情況
NumOfProducts:使用產品數量
HasCrCard:是否有本行信用卡
IsActiveMember:是否活躍用戶
EstimatedSalary:估計收入
Exited:是否已流失,這將作爲我們的標籤數據
首先先載入一些常用模塊
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import neighbors
from sklearn.metrics import classification_report
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
然後用numpy讀入數據,因爲數據中有字符串類型的數據,所以讀入數據的時候dtype設置爲np.str
train_data = np.genfromtxt('Churn-Modelling.csv' , delimiter=',' , dtype=np.str)
test_data = np.genfromtxt('Churn-Modelling-Test-Data.csv',delimiter=',',dtype=np.str)
數據切分,表頭不需要,第0到第倒數第2列爲數據,最後1列爲標籤
x_train = train_data[1:,:-1]
y_train = train_data[1:,-1]
x_test = test_data[1:,:-1]
y_test = test_data[1:,-1]
第0,1,2列數據數據分別爲編號,ID,人名,這三個數據對最後的結果應該影響不大,所以可以刪除掉。
x_train = np.delete(x_train,[0,1,2],axis=1)
x_test = np.delete(x_test,[0,1,2],axis=1)
刪除掉0,1,2列數據後剩下的1,2列數據爲國家地區和性別,都是字符型的數據,需要轉化爲數字類型的數據才能構建模型
labelencoder1 = LabelEncoder()
x_train[:,1] = labelencoder1.fit_transform(x_train[:,1])
x_test[:,1] = labelencoder1.transform(x_test[:,1])
labelencoder2 = LabelEncoder()
x_train[:,2] = labelencoder2.fit_transform(x_train[:,2])
x_test[:,2] = labelencoder2.transform(x_test[:,2])
由於讀取數據的時候用的是np.str類型,所以訓練模型之前要先把string類型的數據變成float類型
x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
然後做數據標準化
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
構建KNN模型並檢驗測試集結果
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
predictions = knn.predict(x_test)
print(classification_report(y_test, predictions))
precision recall f1-score support
0.0 0.80 0.95 0.87 740
1.0 0.69 0.33 0.45 260
micro avg 0.79 0.79 0.79 1000
macro avg 0.75 0.64 0.66 1000
weighted avg 0.77 0.79 0.76 1000
構建MLP模型並檢驗測試集結果
mlp = MLPClassifier(hidden_layer_sizes=(20,10) ,max_iter=500)
mlp.fit(x_train,y_train)
predictions = mlp.predict(x_test)
print(classification_report(y_test, predictions))
precision recall f1-score support
0.0 0.82 0.96 0.88 740
1.0 0.77 0.38 0.51 260
micro avg 0.81 0.81 0.81 1000
macro avg 0.79 0.67 0.70 1000
weighted avg 0.80 0.81 0.79 1000
項目打包
百度網盤
密碼:4t6k