机器学习笔记（2）——感知机

Perceptron（感知机）

感知机是二分类的线性分类器,属于判别模型。由Rosenblatt在1957年提出，是神经网络和支持向量机（SVM）的基础。感知机本身相当于神经网络中的一个神经元，只能进行简单的线性分类。感知机的学习目标是通过训练数据得到线性划分的超平面。为此，引入基于分类误差的损失函数，利用梯度下降法对损失函数进行极小化，来求解感知机模型。

1.感知机模型

定义：假设输入空间（特征空间）是χ⊆Rn ,输出空间是Y={+1,−1} .输入x∈χ 是样本的特征向量，对应的输出y∈Y 是样本的类别。输入空间到输出空间的映射模型为：

f (x) = s i g n (w \cdot x + b)

其中

x 是输入样本的特征向量，

w,b 是感知机的模型参数，

w 是权值，

b 是偏置，都需要通过学习得到。

w⋅x 是

w 和

x 的内积。

2.感知机学习策略

线性可分定义：给定一个数据集

T = {(x 1, y 1), (x 2, y 2), \dots, (x N, y N)}

其中

xi 是样本特征向量，

yi 是样本对应类别，如果存在某个超平面

S:w⋅x+b=0 能够将数据集的正样本和负样本全部正确地划分到超平面的两侧，则样本

T 是线性可分数据集。
假设数据集是线性可分的，感知机的目标就是求得一个能够将所有样本正确分类的超平面，即学习出前面感知机模型的参数

w，b 。为了得到最佳参数，我们需要确定一个学习策略，即定义一个合适的损失函数，并求得

w，b 使损失函数最小。在感知机模型中，一般定义误分类点

xi 到超平面的距离为损失函数。首先，定义点到超平面的距离：

1 ∥ w ∥ | w \cdot x i + b |

对于误分类数据

(xi,yi) 有：

- y i (w \cdot x i + b) > 0

即预测类别

（w⋅xi+b) 和实际类别

yi 总是相反，因此可以定义误分类点

(xi,yi) 到超平面的距离为：

- 1 ∥ w ∥ y i | w \cdot x i + b |

如果有M个误分类点，那么所有误分类点到超平面的总距离为：

- 1 ∥ w ∥ \sum (x i, y i) \in M y i | w \cdot x i + b |

不考虑前面系数，则对于任意给定的训练数据集

T = {(x 1, y 1), (x 2, y 2), \dots, (x N, y N)}

感知机模型

f(x)=sign(w⋅x+b) 学习的损失函数定义为：

L (w, b) = - \sum (x i, y i) \in M y i (w \cdot x i + b)

可以看出，损失函数是非负的，如果没有误分类点了，则损失函数的值为0.因此，给定训练样本

T ，感知机的学习目的就是在假设空间学习参数

w,b ，使得损失函数的值最小，即误分类样本数最少或误分类样本离超平面更近。

3.感知机学习算法

通过上述学习策略，我们知道感知机最终的学习目的可以转化为最优化问题，即对于给定的数据集T , 感知机最终学习的目标就是求一下损失函数的最小值：

min w, b L (w, b) = - \sum (x i, y i) \in M y i (w \cdot x i + b)

在具体求解过程中，采用的是梯度下降法：

\nabla w L (w, b) = - \sum (x i, y i) \in M y i x i

\nabla b L (w, b) = - \sum (x i, y i) \in M y i

所谓梯度，是一个向量，方向是标量场变化最快的方向，大小是改点到周围点的最大变化率。随机选取一个误分类点

(xi,yi) , 对

w,b 的学习过程如下：

w \leftarrow w + η y i x i

b \leftarrow b + η y i

其中，

η(0<η≤1) 是学习步长，也称为学习率。这样，通过迭代算法可以使损失函数

L(w,b) 不断减小，对于线性可分数据集，可以一直减小到0。于是得到了感知机学习算法的原始形式。

Algorithm 2.1
Input: traning_data T={(x1,y1),(x2,y2),…,(xN,yN)} , traning_rate η(0<η≤1)
Output: parameter: w,b , f(x)=sign(w⋅x+b)
Initialize: w0,b0
for randomly selected item (xi,yi)∈T
if yi(w⋅xi+b)≤0
w←w+ηyixi
b←b+ηyi
end if
end for

在示例中，我们选取正样本为x1=(3,3)T,x2=(4,3)T ，负样本为x3=(1,1)T ,通过Python来实现感知机的原始形式：

感知机原始形式代码

# Project: Machine Learning-Perceptron
# Author: Lyndon
# Date: 2015/10/15
import copy
from matplotlib import pyplot as plt
from matplotlib import animation

#initialize
training_data=[[(3,3),1],[(4,3),1],[(1,1),-1]]
w=[0,0]     #weight
b=[0]       #bias
step=1      #learning rate
history=[]

# update parameters using stochastic gradient descent 
# parameter item: an item which is classified into wrong class
# return: NULL 
def update(item):
    global w,b,history
    w[0]+=step*item[1]*item[0][0]
    w[1]+=step*item[1]*item[0][1]
    b[0]+=step*item[1]
    print w,b
    history.append([copy.copy(w),copy.copy(b)])

# calculate the item classified result. y_i*(w*x_i+b)<=0 wrong
# parameter item: training_data
# return: classified result 
def calculate(item):
    res=0;
    for i in range(len(item[0])):
        res+=w[i]*item[0][i]
    res+=b[0]
    res*=item[1]
    return res

# check if the classifier can correctly classify all data
# parameter item: Null
# return: correct or not
def check():
    flag=False
    for item in training_data:
        if calculate(item)<=0:
            update(item)
            flag=True
    if not flag:
        print "Result: w:" +str(w)+"b"+str(b)
    return flag

# main function
if __name__=="__main__":
    for i in range(1000):
        if not check():
            break

# set up the figure
    fig = plt.figure()
    ax = plt.axes(xlim=(0,6),ylim=(0,6))
    line, = ax.plot([],[],lw=2)
    label = ax.text([],[],'')

# initialization function for base frame
    def init():
        line.set_data([],[])
        x1, y1, x2, y2=[], [], [], []  #initialize the training data,(x1,y1) is positive
        for item in training_data:
            if item[1]>0:
                x1.append(item[0][0])
                y1.append(item[0][1])
            else:
                x2.append(item[0][0])
                y2.append(item[0][1])      
        plt.plot(x1,y1,'bo',x2,y2,'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Machine Learning - perceptron')
        return line, label

# animation function 
    def animate(i):
        global ax, line, label
        w = history [i][0]
        b = history [i][1]
        # hyperplane: w[0]*x_i+w[1]*y_i+b=0; y_i=-(w[0]*x_i+b)/w[1]
        if w[1]==0:
            return line, label
        x1 = -6
        y1 = -(w[0]*x1+b[0])/w[1]
        x2 = 6
        y2 = -(w[0]*x2+b[0])/w[1]
        line.set_data([x1,x2],[y1,y2])
        x0=0
        y0 = -(w[0]*x0+b[0])/w[1]
        label.set_text(history[i])
        label.set_position([x0,y0])
        return line,label

# call the animator
    print history
    anim=animation.FuncAnimation(fig,animate,init_func=init,frames=len(history),interval=1000)
    plt.show()

算法的收敛性
对于线性可分的训练数据集T ,则
(1)，存在满足条件∥wˆopt∥=1 的超平面wˆopt⋅xˆ=wopt⋅x+bopt=0 强训练数据集完全分开，且存在γ>0 ,对所有样本有：

y i (w ˆ o p t \cdot x ˆ i) = y i (w o p t \cdot x i + b o p t) \geq γ

(2)令

R=max1≤i≤N∥xˆi∥ 则上述感知机算法在训练机上的误分类次数

k 满足以下不等式，即算法的迭代次数

k \leq (R γ) 2

详细推导可见李航《统计学习方法》P31
感知机学习算法的对偶形式
感知机学习算法的原始形式和对偶形式主要是与支持向量机学习算法的原始形式和对偶形式相对应。其基本思想是将学习参数

w,b 用训练样本

T 的线性组合表示。通过迭代学习，我们可以知道最终的参数可以表示为：

w = \sum i = 1 N α i y i x i

b = \sum i = 1 N α i y i

其中

αi≥0 且

αiη 表示第

i 个样本因为误差而更新的次数。样本更新的次数越多，说明样本距离分离超平面越近，对学习结果的影响越大。我们通过下述算法描述对偶形式

Algorithm 2.2
Input: traning_data T={(x1,y1),(x2,y2),…,(xN,yN)} , traning_rate η(0<η≤1)
Output: parameter: a,b , f(x)=sign(∑Nj=1ajyjxj⋅x+b)
Initialize: a0,b0
for randomly selected item (xi,yi)∈T
if yi(∑Nj=1ajyjxj⋅xi+b)≤0
ai←ai+η
b←b+ηyi
end if
end for

对偶形式中的训练样本仅以内积的形式出现。为了方便，可以预先将训练样本间的内积求出来，并以矩阵形式存储，这样在计算过程中将简化很多运算。矩阵形式为：

$G = [x i \cdot y j] N \times N$
感知机对偶形式代码

# Project: Machine Learning-Perceptron_dual
# Author: Lyndon
# Date: 2015/10/15
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation

#initialize model f(x)=sign(sum(a_j*y_j*x_j*x_i+b))
training_data=np.array([[(3,3),1],[(4,3),1],[(1,1),-1]])
a = np.zeros(len(training_data),np.float)
b = 0.0
Gram = None
x = np.empty((len(training_data),2), np.float)
y = np.array(training_data[:,1])
for i in range(len(training_data)):
    x[i]=training_data[i][0]
history=[]

# calculate the Gram matrix for dual form
# parameter item: Null
# return: Gram matrix
def cal_Gram():
    g = np.empty((len(training_data),len(training_data)),np.float)
    for i in range(len(training_data)):
        for j in range(len(training_data)):
            g[i][j]=np.dot(training_data[i][0],training_data[j][0])
    return g 

# update parameters using stochastic gradient descent 
# parameter item: an item which is classified into wrong class
# return: NULL 
def update(i):
    global a,b,history
    a[i]+=1
    b+=y[i]
    print a,b
    history.append([np.dot(a*y,x),b])

# calculate the item classified result. y_i*(w*x_i+b)<=0 wrong
# parameter item: training_data
# return: classified result 
def calculate(i):
    global a,b
    res = np.dot(a*y,Gram[i])
    res = (res+b)*y[i]
    return res

# check if the classifier can correctly classify all data
# parameter item: Null
# return: correct or not
def check():
    flag=False
    for i in range(len(training_data)):
        if calculate(i)<=0:
            update(i)
            flag=True
    if not flag:
        w=np.dot(a*y,x)
        print "Result: w:" +str(w)+"b:" +str(b)
    return flag

# main function
if __name__=="__main__":
    Gram = cal_Gram()
    for i in range(1000):
        if not check():
            break

# set up the figure
    fig = plt.figure()
    ax = plt.axes(xlim=(0,6),ylim=(0,6))
    line, = ax.plot([],[],lw=2)
    label = ax.text([],[],'')

# initialization function for base frame
    def init():
        line.set_data([],[])
        x1, y1, x2, y2=[], [], [], []  #initialize the training data,(x1,y1) is positive
        for item in training_data:
            if item[1]>0:
                x1.append(item[0][0])
                y1.append(item[0][1])
            else:
                x2.append(item[0][0])
                y2.append(item[0][1])      
        plt.plot(x1,y1,'bo',x2,y2,'rx')
        plt.axis([-6, 6, -6, 6])
        plt.grid(True)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.title('Machine Learning - perceptron_dual')
        return line, label

# animation function 
    def animate(i):
        global ax, line, label
        w = history [i][0]
        b = history [i][1]
        # hyperplane: w[0]*x_i+w[1]*y_i+b=0; y_i=-(w[0]*x_i+b)/w[1]
        if w[1]==0:
            return line,label
        x1 = -6
        y1 = -(w[0]*x1+b)/w[1]
        x2 = 6
        y2 = -(w[0]*x2+b)/w[1]
        line.set_data([x1,x2],[y1,y2])
        x0=0
        y0 = -(w[0]*x0+b)/w[1]
        label.set_text(history[i])
        label.set_position([x0,y0])
        return line,label

# call the animator
    print history
    anim=animation.FuncAnimation(fig,animate,init_func=init,frames=len(history),interval=1000)
    plt.show()

下图是感知机对偶形式的训练过程的可视化过程，通过该图形可以直观的看到该程序对给定样本集的训练过程

PS：
本文为机器学习（2）总结笔记，主要通过Python编程实现，更深入理解概念和原理。理论主要参考李航《统计学习方法》，程序参照了码农网的博客。本文Python实现连接。

机器学习笔记（2）——感知机

Perceptron（感知机）

1.感知机模型

2.感知机学习策略

3.感知机学习算法

機器學習筆記（2）——感知機

機器學習（7）——支持向量機（二）：線性可分支持向量機到非線性支持向量機

一文讀懂機器學習，大數據/自然語言處理/算法全有了……

機器學習筆記（4）——樸素貝葉斯

機器學習(7)——支持向量機（三）：線性支持向量機和軟間隔最大化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結