深度学习笔记-----基于TensorFlow2.2.0代码练习(第二课)

写在正文之前:
这篇紧接着上一篇的博文
深度学习笔记-----基于TensorFlow2.2.0代码练习(第一课)
主要写的是TensorFlow2.0的代码练习,跟随着KGP Talkie的【TensorFlow 2.0】实战进阶教程进行学习,并将其中一些不适用的代码错误进行修改。
本文跟随视频油管非常火的【TensorFlow 2.0】实战进阶教程(中英字幕+代码实战)第二课

课程所需要的数据链接:https://pan.baidu.com/s/1Lpo3l3UaPANOGE_HGJf2TQ
提取码:dqo4
注意:需要把数据放到jupyter目录下

如何建立第一个ANN

1 数据处理
2 建立输入层
3 初始随机化输入权重W
4 建立隐藏层
5 选择优化,损失和精确性指标
6 编译模型
7 使用model.fit 训练模型
8 评估模型
9 如果有需要的话调整模型

#导入库
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Flatten,Dense
#导入包
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split#这是为了把数据分割成训练集和测试集
dataset = pd.read_csv('customer_Churn_Modelling.csv')#读取数据,需要把数据放到和此文件的同一目录
dataset.head()#查看数据
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
X = dataset.drop(labels=['CustomerId','Surname','RowNumber','Exited'],axis =1)#删除数据中的一些然后存入X中
y = dataset['Exited']#y的数据
X.head()
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary
0 619 France Female 42 2 0.00 1 1 1 101348.88
1 608 Spain Female 41 1 83807.86 1 0 1 112542.58
2 502 France Female 42 8 159660.80 3 1 0 113931.57
3 699 France Female 39 1 0.00 2 0 0 93826.63
4 850 Spain Female 43 2 125510.82 1 1 1 79084.10
y.head()
0    1
1    0
2    1
3    0
4    0
Name: Exited, dtype: int64
#处理标签
#将国家Geography和性别gender中的字符转换为数字
from sklearn.preprocessing import LabelEncoder
label1 = LabelEncoder()
X['Geography'] = label1.fit_transform(X['Geography'])#将国家通过LabelEncoder转换为数值
X.head()
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary
0 619 0 Female 42 2 0.00 1 1 1 101348.88
1 608 2 Female 41 1 83807.86 1 0 1 112542.58
2 502 0 Female 42 8 159660.80 3 1 0 113931.57
3 699 0 Female 39 1 0.00 2 0 0 93826.63
4 850 2 Female 43 2 125510.82 1 1 1 79084.10
label2 = LabelEncoder()
X['Gender'] = label1.fit_transform(X['Gender'])#将国家通过LabelEncoder转换为数值
X.head()
CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary
0 619 0 0 42 2 0.00 1 1 1 101348.88
1 608 2 0 41 1 83807.86 1 0 1 112542.58
2 502 0 0 42 8 159660.80 3 1 0 113931.57
3 699 0 0 39 1 0.00 2 0 0 93826.63
4 850 2 0 43 2 125510.82 1 1 1 79084.10

CreditScore Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Geography_1 Geography_2
0 619 0 42 2 0.00 1 1 1 101348.88 0 0
1 608 0 41 1 83807.86 1 0 1 112542.58 0 1
2 502 0 42 8 159660.80 3 1 0 113931.57 0 0
3 699 0 39 1 0.00 2 0 0 93826.63 0 0
4 850 0 43 2 125510.82 1 1 1 79084.10 0 1
5 645 1 44 8 113755.78 2 1 0 149756.71 0 1
6 822 1 50 7 0.00 2 1 1 10062.80 0 0
7 376 0 29 4 115046.74 4 1 0 119346.88 1 0
8 501 1 44 4 142051.07 2 0 1 74940.50 0 0
9 684 1 27 2 134603.88 1 1 1 71725.73 0 0
#把国家信息转换为0到1 的二进制数字,即为某个国家就显示1否则为0
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
X.head(30)
CreditScore Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Geography_1 Geography_2
0 619 0 42 2 0.00 1 1 1 101348.88 0 0
1 608 0 41 1 83807.86 1 0 1 112542.58 0 1
2 502 0 42 8 159660.80 3 1 0 113931.57 0 0
3 699 0 39 1 0.00 2 0 0 93826.63 0 0
4 850 0 43 2 125510.82 1 1 1 79084.10 0 1
5 645 1 44 8 113755.78 2 1 0 149756.71 0 1
6 822 1 50 7 0.00 2 1 1 10062.80 0 0
7 376 0 29 4 115046.74 4 1 0 119346.88 1 0
8 501 1 44 4 142051.07 2 0 1 74940.50 0 0
9 684 1 27 2 134603.88 1 1 1 71725.73 0 0
10 528 1 31 6 102016.72 2 0 0 80181.12 0 0
11 497 1 24 3 0.00 2 1 0 76390.01 0 1
12 476 0 34 10 0.00 2 1 0 26260.98 0 0
13 549 0 25 5 0.00 2 0 0 190857.79 0 0
14 635 0 35 7 0.00 2 1 1 65951.65 0 1
15 616 1 45 3 143129.41 2 0 1 64327.26 1 0
16 653 1 58 1 132602.88 1 1 0 5097.67 1 0
17 549 0 24 9 0.00 2 1 1 14406.41 0 1
18 587 1 45 6 0.00 1 0 0 158684.81 0 1
19 726 0 24 6 0.00 2 1 1 54724.03 0 0
20 732 1 41 8 0.00 2 1 1 170886.17 0 0
21 636 0 32 8 0.00 2 1 0 138555.46 0 1
22 510 0 38 4 0.00 1 1 0 118913.53 0 1
23 669 1 46 3 0.00 2 0 1 8487.75 0 0
24 846 0 38 5 0.00 1 1 1 187616.16 0 0
25 577 1 25 3 0.00 2 0 1 124508.29 0 0
26 756 1 36 2 136815.64 1 1 1 170041.95 1 0
27 571 1 44 9 0.00 2 0 0 38433.35 0 0
28 574 0 43 3 141349.43 1 1 1 100187.43 1 0
29 411 1 29 0 59697.17 2 1 1 53483.21 0 0

特征标准化

#用自带的预处理包进行
from sklearn.preprocessing import StandardScaler
X_train, X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 0, stratify = y)#分测试训练比例为20%。随机关闭,并且按y中类的比例进行分配,避免出现类分布不均衡
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)##标准化测试和训练
y_test
1344    1
8167    0
4747    0
5004    1
3124    1
       ..
9107    0
8249    0
8337    0
6279    1
412     0
Name: Exited, Length: 2000, dtype: int64

构建ANN

model = Sequential()#序列模型
model.add(Dense(X.shape[1],activation='relu',input_dim = X.shape[1]))#输入层的建立X_shape是提取其所有特征数量
model.add(Dense(128,activation = 'relu'))#隐藏层建立
model.add(Dense(1,activation = 'sigmoid'))#输出层建立
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
model.compile(optimizer = 'adam',loss ='binary_crossentropy',metrics=['accuracy'])#采用随机梯度优化,
model.fit(X_train,y_train.to_numpy(),batch_size=10,epochs=10,verbose=1)
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/10
8000/8000 [==============================] - 1s 94us/sample - loss: 0.4515 - acc: 0.8049
Epoch 2/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4185 - acc: 0.8202
Epoch 3/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4057 - acc: 0.8324
Epoch 4/10
8000/8000 [==============================] - 1s 77us/sample - loss: 0.3752 - acc: 0.8431
Epoch 5/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3507 - acc: 0.8571
Epoch 6/10
8000/8000 [==============================] - 1s 78us/sample - loss: 0.3415 - acc: 0.8591
Epoch 7/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3363 - acc: 0.8620
Epoch 8/10
8000/8000 [==============================] - 1s 84us/sample - loss: 0.3345 - acc: 0.8619
Epoch 9/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3328 - acc: 0.8602
Epoch 10/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3302 - acc: 0.8626





<tensorflow.python.keras.callbacks.History at 0x1d77c75d248>
y_pred = model.predict_classes(X_test)
y_pred
array([[0],
       [0],
       [0],
       ...,
       [0],
       [1],
       [0]])
y_test
1344    1
8167    0
4747    0
5004    1
3124    1
       ..
9107    0
8249    0
8337    0
6279    1
412     0
Name: Exited, Length: 2000, dtype: int64
model.evaluate(X_test, y_test.to_numpy())#利用测试集测试训练下的模型的准确度
2000/2000 [==============================] - 0s 34us/sample - loss: 0.3583 - acc: 0.8535





[0.3583366745710373, 0.8535]
#另一种计算精度的方法
from sklearn.metrics import confusion_matrix, accuracy_score
confusion_matrix(y_test,y_pred)
array([[1525,   68],
       [ 225,  182]], dtype=int64)
accuracy_score(y_test,y_pred)
0.8535

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章