1,機器學習算法

1.1 K-NN

（1）K-NN（k近鄰）：K-NN是一種基於實例的學習，其分類不取決於其內在的模型，而是對標籤測試集進行參考。k-NN只是簡單的記住所有訓練數據，並與每個新樣本進行比較，因此它是一種非歸納方法。

KNeighborsClassifier:用戶指定k,近鄰數據的數量，噪聲較大時，k值用較大值，但犧牲了分類邊界的明確性。
RadiusNeighborsClassifier:對每個訓練的數據點指定固定的半徑，當數據不是均勻採樣時，較好。

（2）鳶尾花分類代碼

# -*- coding: utf-8 -*-
"""
Created on Sun Mar 10 21:24:33 2019

@author: Larry
"""
#k-NN對鳶尾花數據進行分類學習算例
from sklearn.neighbors import KNeighborsClassifier as knn
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

def knnDemo(X,y,n):
    #creates the classifier and fits it to the data
    res = 0.05
    k1 = knn(n_neighbors = n,p = 2,metric = 'minkowski')
    k1.fit(X,y)#對數據進行訓練， 'minkowski'爲閔可夫斯基距離
    
    #sets up the grid
    x1_min, x1_max = X[:,0].min() - 1,X[:,0].max() + 1
    x2_min, x2_max = X[:,1].min() - 1,X[:,1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min,x1_max,res), np.arange(x2_min, x2_max, res))#生成座標軸的[x,y]的矩陣，範圍包括了所有值
    
    #makes the prediction
    Z = k1.predict(np.array([xx1.ravel(),xx2.ravel()]).T)#轉化成（x,y）對應的二維座標
    Z = Z.reshape(xx1.shape)#xx1.shape是獲得xx1的大小，然後將Z預測的結果變成和其一樣
    
    #creates the color map
    cmap_light = ListedColormap(['#FFAAAA','#AAFFAA','#AAAAFF'])
    cmap_bold = ListedColormap(['#FF0000','#00FF00','#0000FF'])
    
    #plots the decision surface
    plt.contourf(xx1,xx2,Z,alpha = 0.4,cmap = cmap_light)
    plt.xlim(xx1.min(),xx1.max())
    plt.ylim(xx2.min(),xx2.max())

    #plots the samples
    for idx, c1 in enumerate(np.unique(y)):
        plt.scatter(X[:,0],X[:,1],c = y, cmap = cmap_bold)
    
    plt.show()


iris = datasets.load_iris()
X1 = iris.data[:,0:3:2]
X2 = iris.data[:,0:2]
X3 = iris.data[:,1:3]
y = iris.target
knnDemo(X2,y,15)

（3）關於部分函數的使用

關於contourf繪圖的使用規則
鏈接: https://blog.csdn.net/lens___/article/details/83960810
對於一維數組或者列表，unique函數去除其中重複的元素，並按元素由大到小返回一個新的無元素重複的元組或者列表
enumerate()函數用於將一個可遍歷的數據對象（如列表，元組，字符串，字典等）組合一個索引序列，一般用在for語句中。
鏈接: https://blog.csdn.net/qq_34138155/article/details/81395812

1.2 Scikit-learn解決迴歸問題

（1）LinearRegression()對象

#線性迴歸問題
from sklearn import linear_model
clf = linear_model.LinearRegression()
y=clf.fit([[0, 0], [1, 1], [2, 2]],[0,1,2])
clf.coef_     #線性迴歸問題的估計係數組
array([0.5, 0.5])

（2）linear_model.Ridge()

嶺迴歸可以解決多重線性問題，還可以用於輸入變量遠遠超出樣本數量的情況
linear_model.Ridge()對象使用了L2正則化。對權值向量加以懲罰，這樣會使平均權重更小。減少了對極值的敏感度，模型更爲穩定.
linear_model.Ridge()對象增加了一個正則化參數alpha，小的正值會提高模型的穩定性。可以是浮點數，也可以是數組（大小與目標變量相同）

#當特徵之間有相關性
from sklearn.linear_model import Ridge
import numpy as np
def ridgeReg(alpha):
   n_samples,n_features = 10,5
   y = np.random.randn(n_samples)
   X = np.random.randn(n_samples,n_features)
   clf = Ridge(0.001)
   res = clf.fit(X,y)
   return(res)

res = ridgeReg(0.001)
print(res.coef_)
print(res.intercept_)   #線性模型中的截距或獨立項數組

（3）scikit-learn中的降維算法

降維可以減少模型輸入或特徵變量，同時還能減少過擬合而提高模型的普遍性
主要工作確定哪些是冗餘或無關的數據。特徵選擇和特徵提取
選擇是找子集，提取是結合具有相關性的變量，創造新的特徵變量。
最常用的特徵提取算法：PCA
PCA使用正交變換將一組相關變量轉換成一組不相關變量。
PCA降維要求特徵進行了縮放和平均歸一化，即特徵要具有零均值和相應的值域

# -*- coding: utf-8 -*-
"""
Created on Tue Mar 12 10:20:22 2019

@author: Larry
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import KernelPCA
from sklearn.datasets import make_circles
np.random.seed(0)
X,y = make_circles(n_samples = 400,factor = 0.3,noise = 0.05)
kpca = KernelPCA(kernel = 'rbf',gamma=10)
X_kpca = kpca.fit_transform(X)
plt.figure()
plt.subplot(2,2,1,aspect = 'equal')
plt.title("Original space")
reds = y == 0
biues = y == 1
plt.plot(X[reds,0],X[reds,1],"ro")
plt.plot(X[reds,0],X[reds,1],"ro")
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.subplot(2,2,3,aspect = 'equal')
plt.plot(X_kpca[reds,0],X_kpca[reds,1],"ro")
plt.plot(X_kpca[reds,0],X_kpca[reds,1],"ro")
plt.title("Projection by KPCA")
plt.xlabel("1st principal component in space induced by $\phi$")
plt.ylabel("2nd component")
plt.subplots_adjust(0.02,0.01,0.98,0.94,0.04,0.35)
plt.show()

（4）交叉驗證

分割測試集和驗證集：train_test_split()。
報錯‘’No module named ‘sklearn.cross_validation’，因爲模塊更新了，下面是參考鏈接
https://blog.csdn.net/Jeff_Winger/article/details/82222404

# -*- coding: utf-8 -*-
"""
Created on Tue Mar 12 11:03:31 2019

@author: Larry
"""

from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm
from sklearn import model_selection
iris = datasets.load_iris()
X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.4,random_state=0)
clf = svm.SVC(kernel='linear',C=1).fit(X_train,y_train)
scores=model_selection.cross_val_score(clf,X_train,y_train,cv=5)
print("Accuracy:%0.2f(+/-%0.2f)"%(scores.mean(),scores.std()*2))

（5）決策樹（DT）

# -*- coding: utf-8 -*-
"""
Created on Wed Mar 13 14:36:44 2019

@author: Larry
"""

from sklearn import tree
names = ['size','scale','fruit','butt']
labels = [1,1,1,1,1,0,0,0]
p1=[2,1,0,1]
p2=[1,1,0,1]
p3=[1,1,0,0]
p4=[1,1,0,0]
n1=[0,0,0,0]
n2=[1,0,0,0]
n3=[0,0,1,0]
n4=[1,1,0,0]
data=[p1,p2,p3,p4,n1,n2,n3,n4]

def pred(test,data=data):
    dtre = tree.DecisionTreeClassifier()
    dtre = dtre.fit(data,labels)
    print(dtre.predict([test]))
    with open('data/treeDemo.dot','w') as f:
        f = tree.export_graphviz(dtre,out_file=f,feature_names=names)
        
pred([1,1,0,1])

1.3集成學習

1.3.1Bagging(裝袋)方法

也叫自舉聚類（bootstrap aggregating）
最常見的bagging指的是有放回的抽樣

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 14 16:29:50 2019

@author: Larry
"""

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import datasets

bcls=BaggingClassifier(DecisionTreeClassifier(),max_samples=0.5,max_features=0.5,n_estimators=50)
X,y=datasets.make_blobs(n_samples=8000,centers=2,random_state=0,cluster_std=4)
bcls.fit(X,y)
print(bcls.score(X,y))

（1）sklearn.ensemble模塊中有兩種基於決策樹的算法：隨機森林和極端隨機樹。算法比較

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 14 17:03:35 2019

@author: Larry
"""

from sklearn import model_selection
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import VotingClassifier
from sklearn import datasets

def vclas(w1,w2,w3,w4):
    X,y=datasets.make_classification(n_features=10,n_informative=4,n_samples=500,n_clusters_per_class=5)
    Xtrain,Xtest,ytrain,ytest=model_selection.train_test_split(X,y,test_size=0.4)
    
    clf1=LogisticRegression(random_state=123)
    clf2=GaussianNB()
    clf3=RandomForestClassifier(n_estimators=10,bootstrap=True,random_state=123)
    clf4=ExtraTreesClassifier(n_estimators=10,bootstrap=True,random_state=123)
    
    
    clfes=[clf1,clf2,clf3,clf4]
    
    
    eclf=VotingClassifier(estimators=[('lr',clf1),('gnb',clf2),('rf',clf3),('et',clf4)],voting='soft',weights=[w1,w2,w3,w4])
    
    [c.fit(Xtrain,ytrain) for c in (clf1,clf2,clf3,clf4,eclf)]
    
    
    
    N=5
    ind = np.arange(N)
    width = 0.3
    fig,ax=plt.subplots()
    
    
    
    plt.rcParams['font.sans-serif']=['SimHei']#方法1簡單，解決中文顯示問題
    plt.rcParams['axes.unicode_minus'] = False
    
    for i,clf in enumerate(clfes):
        print(clf,i)
        p1=ax.bar(i,clfes[i].score(Xtrain,ytrain),width=width,color='black')
        p2=ax.bar(i+width,clfes[i].score(Xtest,ytest),width=width,color='grey')
    
    ax.bar(len(clfes)+width,eclf.score(Xtrain,ytrain),width=width,color='black') 
    ax.bar(len(clfes)+width*2,eclf.score(Xtest,ytest),width=width,color='grey') 
    plt.axvline(3.8,color='k',linestyle='dashed')
    ax.set_xticks(ind+width)
    ax.set_xticklabels(['LogisticRegression',
                       'GaussianNB',
                       'RandomForestClassifier',
                       'ExtraTrees',
                       'VotingClassifier'],rotation=40,ha='right')
    #ExtraTrees
    
    plt.title('Train and test score for different classifiers')
    plt.legend([p1[0],p2[0]],['測試','test'],loc='lower left')
#    plt.show()
    plt.savefig("data/temp.png",dpi=500,bbox_inches = 'tight')#解決圖片不清晰，不完整的問題

vclas(1,3,5,4)

1.3.2 Boosting方法

（1）AdaBoost（自適應Boosting）:採用了決策樹分類器作爲基學習器，並且對不可線性分裂的數據建立了決策邊界。
（2）梯度Boosting：

有利於對混合數據類型
預測能力強
採用串行架構不適合並行技術，無法較好的擴展到大數據。

2相關評價指標

（1）ROC曲線（接受者操作特性），繪製了不同閾值下的真正率和假正率。
（2）在信號檢測理論中，ROC圖一直被用來描述分類器命中率和誤報警率之間的權衡。
（3）對於多分類的ROC,可畫出多個ROC,指定其中一個爲正，其他全爲負。

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 14 09:45:27 2019

@author: Larry
"""
import matplotlib.pyplot as plt
from sklearn import svm,datasets
from sklearn.metrics import roc_curve,auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier

X,y=datasets.make_classification(n_samples=100,n_classes=3,n_features=5,n_informative=3,n_redundant=0,random_state=42)
#binarize the output
y=label_binarize(y,classes=[0,1,2])
n_classes=y.shape[1]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.5)
classifier=OneVsRestClassifier(svm.SVC(kernel='linear',probability=True))
y_score=classifier.fit(X_train,y_train).decision_function(X_test)
fpr,tpr,_=roc_curve(y_test[:,0],y_score[:,0])
roc_auc=auc(fpr,tpr)
plt.figure()
plt.plot(fpr,tpr,label='ROC AUC %0.2f' % roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.legend(loc="best")
plt.show()

機器學習算法整理1（python）

1,機器學習算法

1.1 K-NN

1.2 Scikit-learn解決迴歸問題

1.3集成學習

1.3.1Bagging(裝袋)方法

1.3.2 Boosting方法

2相關評價指標

Nginx R31 doc 官方文檔-01-nginx 如何安裝

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

關於scanf和cin接收空格字符的比較

tensorboard查看計算圖

U啓動盤重裝系統，遇到ACPI_BIOS_ERROR

win10+GTX1050Ti+anaconda3+tensorflow1.14.0+cuda10.0+cudnn7.6.1.34（帶GPU使用檢測）

關於GPU下tensorflow加速CNN的運行操作步驟

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結