Python數據分析與挖掘實戰中的錯誤總結與分析--電力竊漏電用戶自動識別

書中關於算法原理和基礎知識介紹的很詳細，話不多數，直接上代碼。

代碼6-1：拉格朗日插值代碼

#-*- coding: utf-8 -*-
#拉格朗日插值代碼
import pandas as pd #導入數據分析庫Pandas
from scipy.interpolate import lagrange #導入拉格朗日插值函數

inputfile = '../data/missing_data.xls' #輸入數據路徑,需要使用Excel格式；
outputfile = '../tmp/missing_data_processed.xls' #輸出數據路徑,需要使用Excel格式

data = pd.read_excel(inputfile, header=None) #讀入數據

#自定義列向量插值函數
#s爲列向量，n爲被插值的位置，k爲取前後的數據個數，默認爲5
def ployinterp_column(s, n, k=5):
  y = s[list(range(n-k, n)) + list(range(n+1, n+1+k))] #取數
  y = y[y.notnull()] #剔除空值
  return lagrange(y.index, list(y))(n) #插值並返回插值結果

#逐個元素判斷是否需要插值
for i in data.columns:
  for j in range(len(data)):
    if (data[i].isnull())[j]: #如果爲空即插值。
      data[i][j] = ployinterp_column(data[i], j)

data.to_excel(outputfile, header=None, index=False) #輸出結果

按照代碼內容，裝載需要的文件，運行代碼不會出錯，但是會有一個警告，無傷大雅，具體的如下：

Future Warning: passing list_likes to .loc or [] with any missing label will raise KeyError, you can use .reindex() as an alternative.

這個警告可以在以後的代碼中注意，遇到的話可以按照修改意見來參考修改。

代碼6-2、6-3、6-4、6-5綜合

#-*- coding: utf-8 -*-

import pandas as pd
from random import shuffle

datafile = '../data/model.xls'
data = pd.read_excel(datafile)
data = data.as_matrix()
shuffle(data)

p = 0.8 #設置訓練數據比例
train = data[:int(len(data)*p),:]
test = data[int(len(data)*p):,:]

from keras.models import Sequential #導入神經網絡初始化函數
from keras.layers.core import Dense, Activation #導入神經網絡層函數、激活函數

netfile = '../tmp/net.model' #構建的神經網絡模型存儲路徑

net = Sequential() #建立神經網絡
net.add(Dense(3, 10)) #添加輸入層（3節點）到隱藏層（10節點）的連接
net.add(Activation('relu')) #隱藏層使用relu激活函數
net.add(Dense(10, 1)) #添加隱藏層（10節點）到輸出層（1節點）的連接
net.add(Activation('sigmoid')) #輸出層使用sigmoid激活函數
net.compile(loss = 'binary_crossentropy', optimizer = 'adam', class_mode = "binary") #編譯模型，使用adam方法求解

net.fit(train[:,:3], train[:,3], nb_epoch=1000, batch_size=1) #訓練模型，循環1000次
net.save_weights(netfile) #保存模型

from sklearn.metrics import confusion_matrix #導入混淆矩陣函數

predict_result = net.predict_classes(train[:,:3]).reshape(len(train)) #預測結果變形
'''這裏要提醒的是，keras用predict給出預測概率，predict_classes纔是給出預測類別，而且兩者的預測結果都是n x 1維數組，而不是通常的 1 x n'''

cm = confusion_matrix(train[:,3], predict_result) #混淆矩陣

import matplotlib.pyplot as plt #導入作圖庫
plt.matshow(cm, cmap=plt.cm.Greens) #畫混淆矩陣圖，配色風格使用cm.Greens，更多風格請參考官網。
plt.colorbar() #顏色標籤

for x in range(len(cm)): #數據標籤
  for y in range(len(cm)):
    plt.annotate(cm[x,y], xy=(x, y), horizontalalignment='center', verticalalignment='center')

plt.ylabel('True label') #座標軸標籤
plt.xlabel('Predicted label') #座標軸標籤
plt.show() #顯示作圖結果

from sklearn.metrics import roc_curve #導入ROC曲線函數

predict_result = net.predict(test[:,:3]).reshape(len(test))
fpr, tpr, thresholds = roc_curve(test[:,3], predict_result, pos_label=1)
plt.plot(fpr, tpr, linewidth=2, label = 'ROC of LM') #作出ROC曲線
plt.xlabel('False Positive Rate') #座標軸標籤
plt.ylabel('True Positive Rate') #座標軸標籤
plt.ylim(0,1.05) #邊界範圍
plt.xlim(0,1.05) #邊界範圍
plt.legend(loc=4) #圖例
plt.show() #顯示作圖結果

第一處：data=data.as_martix()，還是和以前一樣的錯誤，修改爲data=data.values。

第二處：net.add(Dense(3,10))，以前也出現過，修改爲Dense(10,input_dim=3)，後面的同理。

第三處：net.complie函數中出現的，以前也出現過，class_mode="binary" not support，修改爲metrics=['accuracy']。

警告如下：

1. plt.tight_layout doesn`t always work, but plt.save('fig.png',bbox_inches='tight') does.

2.userwarning:This figure includes AXES that are not compatible with tight_layout,so result might be incorrect.

3.sklearn.externals.joblib will be removed in 0.23,import this functionality directly from joblib.

哈哈哈哈，到這裏這篇文章就結束啦，不要問我爲什麼要這樣改，其實我也不知道爲什麼要這樣改。過幾天更新下一篇，麼麼噠。

Python數據分析與挖掘實戰中的錯誤總結與分析--電力竊漏電用戶自動識別

SQL優化-20231016

Python入門基礎第九課--元組

機器學習實戰之決策樹(不帶剪枝)分類算分享交流

Machine Learning~PDF(中+英)以及源碼分享

Python進階第三課--網絡編程(一)

Python入門基礎第十五課--面向對象

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結