機器學習筆記（2）——感知機

Perceptron（感知機）

感知機是二分類的線性分類器,屬於判別模型。由Rosenblatt在1957年提出，是神經網絡和支持向量機（SVM）的基礎。感知機本身相當於神經網絡中的一個神經元，只能進行簡單的線性分類。感知機的學習目標是通過訓練數據得到線性劃分的超平面。爲此，引入基於分類誤差的損失函數，利用梯度下降法對損失函數進行極小化，來求解感知機模型。

1.感知機模型

定義：假設輸入空間（特徵空間）是χ⊆Rn ,輸出空間是Y={+1,−1} .輸入x∈χ 是樣本的特徵向量，對應的輸出y∈Y 是樣本的類別。輸入空間到輸出空間的映射模型爲：

f (x) = s i g n (w \cdot x + b)

其中

x 是輸入樣本的特徵向量，

w,b 是感知機的模型參數，

w 是權值，

b 是偏置，都需要通過學習得到。

w⋅x 是

w 和

x 的內積。

2.感知機學習策略

線性可分定義：給定一個數據集

T = {(x 1, y 1), (x 2, y 2), \dots, (x N, y N)}

其中

xi 是樣本特徵向量，

yi 是樣本對應類別，如果存在某個超平面

S:w⋅x+b=0 能夠將數據集的正樣本和負樣本全部正確地劃分到超平面的兩側，則樣本

T 是線性可分數據集。
假設數據集是線性可分的，感知機的目標就是求得一個能夠將所有樣本正確分類的超平面，即學習出前面感知機模型的參數

w，b 。爲了得到最佳參數，我們需要確定一個學習策略，即定義一個合適的損失函數，並求得

w，b 使損失函數最小。在感知機模型中，一般定義誤分類點

xi 到超平面的距離爲損失函數。首先，定義點到超平面的距離：

1 ∥ w ∥ | w \cdot x i + b |

對於誤分類數據

(xi,yi) 有：

- y i (w \cdot x i + b) > 0

即預測類別

（w⋅xi+b) 和實際類別

yi 總是相反，因此可以定義誤分類點

(xi,yi) 到超平面的距離爲：

- 1 ∥ w ∥ y i | w \cdot x i + b |

如果有M個誤分類點，那麼所有誤分類點到超平面的總距離爲：

- 1 ∥ w ∥ \sum (x i, y i) \in M y i | w \cdot x i + b |

不考慮前面係數，則對於任意給定的訓練數據集

T = {(x 1, y 1), (x 2, y 2), \dots, (x N, y N)}

感知機模型

f(x)=sign(w⋅x+b) 學習的損失函數定義爲：

L (w, b) = - \sum (x i, y i) \in M y i (w \cdot x i + b)

可以看出，損失函數是非負的，如果沒有誤分類點了，則損失函數的值爲0.因此，給定訓練樣本

T ，感知機的學習目的就是在假設空間學習參數

w,b ，使得損失函數的值最小，即誤分類樣本數最少或誤分類樣本離超平面更近。

3.感知機學習算法

通過上述學習策略，我們知道感知機最終的學習目的可以轉化爲最優化問題，即對於給定的數據集T , 感知機最終學習的目標就是求一下損失函數的最小值：

min w, b L (w, b) = - \sum (x i, y i) \in M y i (w \cdot x i + b)

在具體求解過程中，採用的是梯度下降法：

\nabla w L (w, b) = - \sum (x i, y i) \in M y i x i

\nabla b L (w, b) = - \sum (x i, y i) \in M y i

所謂梯度，是一個向量，方向是標量場變化最快的方向，大小是改點到周圍點的最大變化率。隨機選取一個誤分類點

(xi,yi) , 對

w,b 的學習過程如下：

w \leftarrow w + η y i x i

b \leftarrow b + η y i

其中，

η(0<η≤1) 是學習步長，也稱爲學習率。這樣，通過迭代算法可以使損失函數

L(w,b) 不斷減小，對於線性可分數據集，可以一直減小到0。於是得到了感知機學習算法的原始形式。

Algorithm 2.1
Input: traning_data T={(x1,y1),(x2,y2),…,(xN,yN)} , traning_rate η(0<η≤1)
Output: parameter: w,b , f(x)=sign(w⋅x+b)
Initialize: w0,b0
for randomly selected item (xi,yi)∈T
if yi(w⋅xi+b)≤0
w←w+ηyixi
b←b+ηyi
end if
end for

在示例中，我們選取正樣本爲x1=(3,3)T,x2=(4,3)T ，負樣本爲x3=(1,1)T ,通過Python來實現感知機的原始形式：

感知機原始形式代碼

# Project: Machine Learning-Perceptron
# Author: Lyndon
# Date: 2015/10/15
import copy
from matplotlib import pyplot as plt
from matplotlib import animation

#initialize
training_data=[[(3,3),1],[(4,3),1],[(1,1),-1]]
w=[0,0]     #weight
b=[0]       #bias
step=1      #learning rate
history=[]

# update parameters using stochastic gradient descent 
# parameter item: an item which is classified into wrong class
# return: NULL 
def update(item):
    global w,b,history
    w[0]+=step*item[1]*item[0][0]
    w[1]+=step*item[1]*item[0][1]
    b[0]+=step*item[1]
    print w,b
    history.append([copy.copy(w),copy.copy(b)])

# calculate the item classified result. y_i*(w*x_i+b)<=0 wrong
# parameter item: training_data
# return: classified result 
def calculate(item):
    res=0;
    for i in range(len(item[0])):
        res+=w[i]*item[0][i]
    res+=b[0]
    res*=item[1]
    return res

# check if the classifier can correctly classify all data
# parameter item: Null
# return: correct or not
def check():
    flag=False
    for item in training_data:
        if calculate(item)<=0:
            update(item)
            flag=True
    if not flag:
        print "Result: w:" +str(w)+"b"+str(b)
    return flag

# main function
if __name__=="__main__":
    for i in range(1000):
        if not check():
            break

# set up the figure
    fig = plt.figure()
    ax = plt.axes(xlim=(0,6),ylim=(0,6))
    line, = ax.plot([],[],lw=2)
    label = ax.text([],[],'')

# initialization function for base frame
    def init():
        line.set_data([],[])
        x1, y1, x2, y2=[], [], [], []  #initialize the training data,(x1,y1) is positive
        for item in training_data:
            if item[1]>0:
                x1.append(item[0][0])
                y1.append(item[0][1])
            else:
                x2.append(item[0][0])
                y2.append(item[0][1])      
        plt.plot(x1,y1,'bo',x2,y2,'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Machine Learning - perceptron')
        return line, label

# animation function 
    def animate(i):
        global ax, line, label
        w = history [i][0]
        b = history [i][1]
        # hyperplane: w[0]*x_i+w[1]*y_i+b=0; y_i=-(w[0]*x_i+b)/w[1]
        if w[1]==0:
            return line, label
        x1 = -6
        y1 = -(w[0]*x1+b[0])/w[1]
        x2 = 6
        y2 = -(w[0]*x2+b[0])/w[1]
        line.set_data([x1,x2],[y1,y2])
        x0=0
        y0 = -(w[0]*x0+b[0])/w[1]
        label.set_text(history[i])
        label.set_position([x0,y0])
        return line,label

# call the animator
    print history
    anim=animation.FuncAnimation(fig,animate,init_func=init,frames=len(history),interval=1000)
    plt.show()

算法的收斂性
對於線性可分的訓練數據集T ,則
(1)，存在滿足條件∥wˆopt∥=1 的超平面wˆopt⋅xˆ=wopt⋅x+bopt=0 強訓練數據集完全分開，且存在γ>0 ,對所有樣本有：

y i (w ˆ o p t \cdot x ˆ i) = y i (w o p t \cdot x i + b o p t) \geq γ

(2)令

R=max1≤i≤N∥xˆi∥ 則上述感知機算法在訓練機上的誤分類次數

k 滿足以下不等式，即算法的迭代次數

k \leq (R γ) 2

詳細推導可見李航《統計學習方法》P31
感知機學習算法的對偶形式
感知機學習算法的原始形式和對偶形式主要是與支持向量機學習算法的原始形式和對偶形式相對應。其基本思想是將學習參數

w,b 用訓練樣本

T 的線性組合表示。通過迭代學習，我們可以知道最終的參數可以表示爲：

w = \sum i = 1 N α i y i x i

b = \sum i = 1 N α i y i

其中

αi≥0 且

αiη 表示第

i 個樣本因爲誤差而更新的次數。樣本更新的次數越多，說明樣本距離分離超平面越近，對學習結果的影響越大。我們通過下述算法描述對偶形式

Algorithm 2.2
Input: traning_data T={(x1,y1),(x2,y2),…,(xN,yN)} , traning_rate η(0<η≤1)
Output: parameter: a,b , f(x)=sign(∑Nj=1ajyjxj⋅x+b)
Initialize: a0,b0
for randomly selected item (xi,yi)∈T
if yi(∑Nj=1ajyjxj⋅xi+b)≤0
ai←ai+η
b←b+ηyi
end if
end for

對偶形式中的訓練樣本僅以內積的形式出現。爲了方便，可以預先將訓練樣本間的內積求出來，並以矩陣形式存儲，這樣在計算過程中將簡化很多運算。矩陣形式爲：

$G = [x i \cdot y j] N \times N$
感知機對偶形式代碼

# Project: Machine Learning-Perceptron_dual
# Author: Lyndon
# Date: 2015/10/15
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation

#initialize model f(x)=sign(sum(a_j*y_j*x_j*x_i+b))
training_data=np.array([[(3,3),1],[(4,3),1],[(1,1),-1]])
a = np.zeros(len(training_data),np.float)
b = 0.0
Gram = None
x = np.empty((len(training_data),2), np.float)
y = np.array(training_data[:,1])
for i in range(len(training_data)):
    x[i]=training_data[i][0]
history=[]

# calculate the Gram matrix for dual form
# parameter item: Null
# return: Gram matrix
def cal_Gram():
    g = np.empty((len(training_data),len(training_data)),np.float)
    for i in range(len(training_data)):
        for j in range(len(training_data)):
            g[i][j]=np.dot(training_data[i][0],training_data[j][0])
    return g 

# update parameters using stochastic gradient descent 
# parameter item: an item which is classified into wrong class
# return: NULL 
def update(i):
    global a,b,history
    a[i]+=1
    b+=y[i]
    print a,b
    history.append([np.dot(a*y,x),b])

# calculate the item classified result. y_i*(w*x_i+b)<=0 wrong
# parameter item: training_data
# return: classified result 
def calculate(i):
    global a,b
    res = np.dot(a*y,Gram[i])
    res = (res+b)*y[i]
    return res

# check if the classifier can correctly classify all data
# parameter item: Null
# return: correct or not
def check():
    flag=False
    for i in range(len(training_data)):
        if calculate(i)<=0:
            update(i)
            flag=True
    if not flag:
        w=np.dot(a*y,x)
        print "Result: w:" +str(w)+"b:" +str(b)
    return flag

# main function
if __name__=="__main__":
    Gram = cal_Gram()
    for i in range(1000):
        if not check():
            break

# set up the figure
    fig = plt.figure()
    ax = plt.axes(xlim=(0,6),ylim=(0,6))
    line, = ax.plot([],[],lw=2)
    label = ax.text([],[],'')

# initialization function for base frame
    def init():
        line.set_data([],[])
        x1, y1, x2, y2=[], [], [], []  #initialize the training data,(x1,y1) is positive
        for item in training_data:
            if item[1]>0:
                x1.append(item[0][0])
                y1.append(item[0][1])
            else:
                x2.append(item[0][0])
                y2.append(item[0][1])      
        plt.plot(x1,y1,'bo',x2,y2,'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Machine Learning - perceptron_dual')
        return line, label

# animation function 
    def animate(i):
        global ax, line, label
        w = history [i][0]
        b = history [i][1]
        # hyperplane: w[0]*x_i+w[1]*y_i+b=0; y_i=-(w[0]*x_i+b)/w[1]
        if w[1]==0:
            return line,label
        x1 = -6
        y1 = -(w[0]*x1+b)/w[1]
        x2 = 6
        y2 = -(w[0]*x2+b)/w[1]
        line.set_data([x1,x2],[y1,y2])
        x0=0
        y0 = -(w[0]*x0+b)/w[1]
        label.set_text(history[i])
        label.set_position([x0,y0])
        return line,label

# call the animator
    print history
    anim=animation.FuncAnimation(fig,animate,init_func=init,frames=len(history),interval=1000)
    plt.show()

下圖是感知機對偶形式的訓練過程的可視化過程，通過該圖形可以直觀的看到該程序對給定樣本集的訓練過程

PS：
本文爲機器學習（2）總結筆記，主要通過Python編程實現，更深入理解概念和原理。理論主要參考李航《統計學習方法》，程序參照了碼農網的博客。本文Python實現連接。

機器學習筆記（2）——感知機

Perceptron（感知機）

1.感知機模型

2.感知機學習策略

3.感知機學習算法

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

【Python】保存gym截圖

【譯】使用 GitHub Copilot 作爲你的編碼 GPS

Linux 服務器配置-安裝portainer-ce社區版

外行也能讀懂的網絡硬件設備功能原理速成

安裝Auto-GPT

機器學習筆記（2）——感知機

機器學習（7）——支持向量機（二）：線性可分支持向量機到非線性支持向量機

一文讀懂機器學習，大數據/自然語言處理/算法全有了……

機器學習筆記（4）——樸素貝葉斯

機器學習(7)——支持向量機（三）：線性支持向量機和軟間隔最大化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結