【ML从入门到入土系列02】线性回归与逻辑回归

文章目录

1 线性回归

2 逻辑回归

1 线性回归

1.1 线性模型

线性模型是学习一个通过特征的线性组合来进行预测的函数
其数学形式为： $f(\boldsymbol{x})=\boldsymbol{w}^{T} \boldsymbol{x}+b$

1.2 定义

线性回归是采用线性模型解决回归问题，其输出值为连续型变量，实质是一种线性映射 $f(\mathbf{x})=\theta^{T} \mathbf{x}$ 。

1.3 损失函数

线性回归一般采用均方损失MSE，其公式如下：
$J\left(\theta_{0}, \theta_{1}, \ldots, \theta_{n}\right)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}$
其一阶与二阶函数的示意图如下所示，显然他们都是凸函数，便于后期采用梯度下降法优化。

1.4 梯度下降

梯度下降法即沿着loss梯度方向逐步修正参数，类似于下山，直至山底。

一元
$\theta_{1}:=\theta_{1}-\alpha \frac{d}{d \theta_{1}} J\left(\theta_{1}\right)$
二元
$\begin{array}{l} \theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \\ \theta_{1}:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x^{(i)} \end{array}$
其中，学习率设置十分重要，过大会振荡甚至不收敛，过小收敛速度会很慢。

1.5 过拟合与正则化

模型训练经常遇到过拟合的情况，犹如模拟考出色，大考失败。泛化能力差。一般采用正则化添加惩罚参数来解决。公式如下：
$J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right]$

1.6 案例

线性回归预测城市入均收入情况。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from mpl_toolkits.mplot3d import axes3d

pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 150)
pd.set_option('display.max_seq_items', None)
 
# 载入数据
data = np.loadtxt('linear_regression_data1.txt', delimiter=',')

X = np.c_[np.ones(data.shape[0]),data[:,0]]
y = np.c_[data[:,1]]

# 数据可视化
plt.scatter(X[:,1], y, s=30, c='r', marker='x', linewidths=1)
plt.xlim(4,24)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s');

# loss
def computeCost(X, y, theta=[[0],[0]]):
    m = y.size
    J = 0
    h = X.dot(theta)
    J = 1.0/(2*m)*(np.sum(np.square(h-y))) # MSE 
    return J

# 梯度下降
def gradientDescent(X, y, theta=[[0],[0]], alpha=0.01, num_iters=1500):
    m = y.size
    J_history = np.zeros(num_iters)  
    for iter in np.arange(num_iters):
        h = X.dot(theta)
        theta = theta - alpha*(1.0/m)*(X.T.dot(h-y)) # 参数更新
        J_history[iter] = computeCost(X, y, theta)
    return(theta, J_history)

xx = np.arange(5,23)
yy = theta[0]+theta[1]*xx

# loss收敛图（我的线性回归）
plt.scatter(X[:,1], y, s=30, c='r', marker='x', linewidths=1)
plt.plot(xx,yy, label='Linear regression (Gradient descent)')

# loss收敛图（Scikit-learn） 
regr = LinearRegression()
regr.fit(X[:,1].reshape(-1,1), y.ravel())
plt.plot(xx, regr.intercept_+regr.coef_*xx, label='Linear regression (Scikit-learn GLM)')

plt.xlim(4,24)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.legend(loc=4);

# 预测一下人口为70000的城市的结果
print(theta.T.dot([1, 7])*10000)

[ 45342.45012945]

2 逻辑回归

2.1 定义

线性回归解决分类问题不鲁棒，其决策边界是线性的，太古板。逻辑回归应运而生。

2.2 决策边界

逻辑回归的精髓就是确定决策边界，可以是线性的边界，也可以是非线性的。如下图所示。前提是用sigmoid函数将数据压缩到0-1之间，公式为： $y=\frac{1}{1+e^{-z}}$

2.3 损失函数

一般不采用线性回归的MSE，因为会存在非凸函数。一般采用二元交叉熵损失，公式如下：
$\operatorname{cost}\left(h_{\theta}(x), y\right)=\left\{\begin{aligned} -\log \left(h_{\theta}(x)\right) & \text { if } y=1 \\ -\log \left(1-h_{\theta}(x)\right) & \text { if } y=0 \end{aligned}\right.$
为了防止过拟合，增加了正则化项，最终损失函数公式如下：
$J(\theta)=\left[-\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right) \right)+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}\right.$

2.4 梯度下降

同线性回归一样，在保证loss是凸函数的前提下，采用梯度下降法优化。公式如下：
$\theta_{j}:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J(\theta)$

2.5 案例

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from sklearn.preprocessing import PolynomialFeatures

pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 150)
pd.set_option('display.max_seq_items', None)

# 绘图函数
def plotData(data, label_x, label_y, label_pos, label_neg, axes=None):
    # 获得正负样本的下标
    neg = data[:,2] == 0
    pos = data[:,2] == 1
    
    if axes == None:
        axes = plt.gca()
    axes.scatter(data[pos][:,0], data[pos][:,1], marker='+', c='k', s=60, linewidth=2, label=label_pos)
    axes.scatter(data[neg][:,0], data[neg][:,1], c='y', s=60, label=label_neg)
    axes.set_xlabel(label_x)
    axes.set_ylabel(label_y)
    axes.legend(frameon= True, fancybox = True);
     
# 载入数据
data2 = np.loadtxt('data2.txt', ',')
X = data2[:,0:2]
y = np.c_[data2[:,2]]

# loss
def costFunctionReg(theta, reg, *args):
    m = y.size
    h = sigmoid(XX.dot(theta))
    J = -1.0*(1.0/m)*(np.log(h).T.dot(y)+np.log(1-h).T.dot(1-y)) + (reg/(2.0*m))*np.sum(np.square(theta[1:])) # 二元交叉熵loss
    if np.isnan(J[0]):
        return(np.inf)
    return(J[0])

# 梯度下降
def gradientReg(theta, reg, *args):
    m = y.size
    h = sigmoid(XX.dot(theta.reshape(-1,1))) # 激活函数
    grad = (1.0/m)*XX.T.dot(h-y) + (reg/m)*np.r_[[[0]],theta[1:].reshape(-1,1)]   
    return(grad.flatten())

# 最高6阶多项式
poly = PolynomialFeatures(6)
XX = poly.fit_transform(data2[:,0:2])
initial_theta = np.zeros(XX.shape[1])
costFunctionReg(initial_theta, 1, XX, y)

fig, axes = plt.subplots(1,3, sharey = True, figsize=(17,5))
# 分别设置不同的正则化系数：0.0, 1.0, 100.0
for i, C in enumerate([0.0, 1.0, 100.0]):
    res2 = minimize(costFunctionReg, initial_theta, args=(C, XX, y), jac=gradientReg, options={'maxiter':3000})
    # 准确率
    accuracy = 100.0*sum(predict(res2.x, XX) == y.ravel())/y.size   
    # 可视化
    plotData(data2, 'Microchip Test 1', 'Microchip Test 2', 'y = 1', 'y = 0', axes.flatten()[i])
    # 画出决策边界
    x1_min, x1_max = X[:,0].min(), X[:,0].max(),
    x2_min, x2_max = X[:,1].min(), X[:,1].max(),
    xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
    h = sigmoid(poly.fit_transform(np.c_[xx1.ravel(), xx2.ravel()]).dot(res2.x))
    h = h.reshape(xx1.shape)
    axes.flatten()[i].contour(xx1, xx2, h, [0.5], linewidths=1, colors='g');       
    axes.flatten()[i].set_title('Train accuracy {}% with Lambda = {}'.format(np.round(accuracy, decimals=2), C))

【ML从入门到入土系列02】线性回归与逻辑回归

文章目录

1 线性回归

1.1 线性模型

1.2 定义

1.3 损失函数

1.4 梯度下降

1.5 过拟合与正则化

1.6 案例

2 逻辑回归

2.1 定义

2.2 决策边界

2.3 损失函数

2.4 梯度下降

2.5 案例

使用c#强大的表达式树实现对象的深克隆之解决循环引用的问题

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU启动那些事（12.A）- uSDHC eMMC启动时间(RT1170)

linux安装cuda和cudnn

Mellanox网卡开启SR-IOV

模拟手机设备：使用 Playwright 实现移动端自动化测试

HTML 00 Tutorial

全面系统的AI学习路径，帮助普通人也能玩转AI

从零开始：使用 Playwright 脚本录制实现自动化测试

uni-app实现上拉加载

【ML從入門到入土系列03】K近鄰

一文搞定YOLO3訓練自己的數據集

【ML從入門到入土系列09】HMM

【劍指Offer系列68-2】二叉樹的最近公共祖先

一文搞定ML從入門到入土（附網盤鏈接）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結