廣義線性模型(Generalized_Linear_Model)

1. 線性迴歸

1.1 多元線性迴歸模型

給定訓練數據集
D={(x1,y1),(x2,y2),,(xi,yi),,(xN,yN)}\begin{aligned} \\& D = \left\{ \left( \mathbf{x}_{1}, y_{1} \right), \left( \mathbf{x}_{2}, y_{2} \right), \cdots, \left(\mathbf{x}_i,y_i\right),\dots, \left( \mathbf{x}_{N}, y_{N} \right) \right\} \end{aligned}
其中,xiXRn,yiYR\mathbf{x}_{i} \in \mathcal{X}\subseteq\mathbb{R}^{n}, y_{i} \in \mathcal{Y}\subseteq\mathbb{R}

多元線性迴歸模型:
f(x)=wx+b=i=1nw(i)x(i)+bf\left(\mathbf{x}\right)=\mathbf{w}\cdot\mathbf{x}+b=\sum_{i=1}^n w^{\left(i\right)}\cdot x^{\left(i\right)}+b
其中,xXRn\mathbf{x} \in \mathcal{X}\subseteq \mathbb{R}^{n}是輸入記錄,w=(w(1),w(2),,w(n))Rn\mathbf{w}=\left(w^{\left(1\right)},w^{\left(2\right)},\dots,w^{\left(n\right)}\right)^\top \in \mathbb{R}^{n}bRb \in \mathbb{R}是模型參數,w\mathbf{w}稱爲權值向量,bb稱爲偏置,wx\mathbf{w} \cdot \mathbf{x}w\mathbf{w}x\mathbf{x}的內積。

n=1n=1時,模型爲一元線性迴歸模型:
f(x)=wx+bf\left(x\right)=w\cdot x+b
其中,wRw\in\mathbb{R}bRb\in\mathbb{R}爲模型參數。


w^=(w,b)x^=(x,1)\hat{\mathbf{w}}=\left(\mathbf{w},b\right)^\top \\ \hat{\mathbf{x}}=\left(\mathbf{x},1\right)^\top
則多元線性迴歸模型可簡化爲
f(x^)=w^x^f\left(\hat{\mathbf{x}}\right)=\hat{\mathbf{w}}\cdot\hat{\mathbf{x}}
其中,x^\hat{\mathbf{x}}爲增廣特徵向量,w^\hat{\mathbf{w}}爲增廣權重。

1.2 多元線性迴歸參數學習——經驗風險最小化與結構風險最小化

損失函數:平方損失損失函數
L(y,f(x))=(yf(x))2L\left(y,f\left(\mathbf{x}\right)\right)=\left(y-f\left(\mathbf{x}\right)\right)^2

經驗風險
Remp(f)=1Ni=1NL(yi,f(xi))\begin{aligned} R_{emp} \left( f \right) = \dfrac{1}{N} \sum_{i=1}^{N} L \left(y_{i}, f \left( \mathbf{x}_{i} \right) \right) \end{aligned}

模型參數最優解:
w^=argminw^i=1N(yif(x^i))2=argminw^i=1N(yiw^x^i)2\begin{aligned} \hat{\mathbf{w}}^*&=\mathop{\arg\min}_{\hat{\mathbf{w}}}\sum_{i=1}^N \left(y_i-f\left(\hat{\mathbf{x}}_i\right)\right)^2 \\ &=\mathop{\arg\min}_{\hat{\mathbf{w}}}\sum_{i=1}^N \left(y_i-\hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_i\right)^2 \end{aligned}

基於均方誤差最小化來進行模型求解的方法稱爲“最小二乘法”(least square method)。

等價的,模型參數最優解:
w^=argminw^(yXw^)(yXw^)\hat{\mathbf{w}}^*=\mathop{\arg\min}_{\hat{\mathbf{w}}} \left(\mathbf{y}-\mathbf{X}\hat{\mathbf{w}}\right)^\top\left(\mathbf{y}-\mathbf{X}\hat{\mathbf{w}}\right)
其中,
X=(x11x21xN1)=(x^1x^2x^N)y=(y1,y2,,yN)\mathbf{X}=\begin{pmatrix} \mathbf{x}_1^\top & 1 \\ \mathbf{x}_2^\top & 1 \\ \vdots & \vdots \\ \mathbf{x}_N^\top & 1\end{pmatrix} =\begin{pmatrix} \hat{\mathbf{x}}_1^\top \\ \hat{\mathbf{x}}_2^\top \\ \vdots \\ \hat{\mathbf{x}}_N^\top \end{pmatrix}\\ \mathbf{y}=\left(y_1,y_2,\dots,y_N\right)^\top

Ew^=(yXw^)(yXw^)E_{\hat{\mathbf{w}}}=\left(\mathbf{y}-\mathbf{X}\hat{\mathbf{w}}\right)^\top\left(\mathbf{y}-\mathbf{X}\hat{\mathbf{w}}\right),對w^\hat{\mathbf{w}}求偏導,得
Ew^w^=2X(Xw^y)\frac{\partial E_{\hat{\mathbf{w}}}}{\partial \hat{\mathbf{w}}}=2\mathbf{X}^\top\left(\mathbf{X}\hat{\mathbf{w}}-\mathbf{y}\right)

XX\mathbf{X}^\top\mathbf{X}爲滿秩矩陣或正定矩陣時,令上式爲零可得最優閉式解
w^=(XX)1Xy\hat{\mathbf{w}}^*=\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{X}^\top\mathbf{y}

當上述條件不滿足時,可使用主成分分析(PCA)等方法消除特徵間的線性相關性,再使用最小二乘法求解。
或者通過梯度下降法,初始化w^0=0\hat{\mathbf{w}}_0=\mathbf{0},進行迭代
w^w^αX(Xw^y)\hat{\mathbf{w}}\gets\hat{\mathbf{w}}-\alpha\mathbf{X}^\top\left(\mathbf{X}\hat{\mathbf{w}}-\mathbf{y}\right)
其中,α\alpha是學習率。

結構風險
Rstr=1Ni=1NL(yi,f(xi))+λJ(f)\begin{aligned} R_{str}= \dfrac{1}{N} \sum_{i=1}^{N} L \left(y_{i}, f \left( \mathbf{x}_{i} \right) \right) + \lambda J \left(f\right) \end{aligned}

嶺迴歸(Ridge Regression)
Rstr=1Ni=1NL(yi,f(xi))+αw2,α0\begin{aligned} R_{str}= \dfrac{1}{N} \sum_{i=1}^{N} L \left(y_{i}, f \left( \mathbf{x}_{i} \right) \right) + \alpha\|\mathbf{w}\|^2,\alpha\geq0 \end{aligned}

套索迴歸(Lasso Regression)
Rstr=1Ni=1NL(yi,f(xi))+αw1,α0\begin{aligned} R_{str}= \dfrac{1}{N} \sum_{i=1}^{N} L \left(y_{i}, f \left( \mathbf{x}_{i} \right) \right) + \alpha\|\mathbf{w}\|_1,\alpha\geq0 \end{aligned}

彈性網絡迴歸(Elastic Net)
Rstr=1Ni=1NL(yi,f(xi))+αρw1+α(1ρ)2w2,α0,1ρ0\begin{aligned} R_{str}= \dfrac{1}{N} \sum_{i=1}^{N} L \left(y_{i}, f \left( \mathbf{x}_{i} \right) \right) + \alpha\rho\|\mathbf{w}\|_1+\frac{\alpha\left(1-\rho\right)}{2}\|\mathbf{w}\|^2,\alpha\geq0,1\geq \rho\geq0\end{aligned}

1.3 多元線性迴歸模型應用

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, model_selection

def load_data():
    diabetes = datasets.load_diabetes()
    return model_selection.train_test_split(diabetes.data,diabetes.target,test_size=0.25,random_state=0) 

def test_LinearRegression(*data):
    X_train,X_test,y_train,y_test=data
    regr = linear_model.LinearRegression()
    regr.fit(X_train, y_train)
    print('Coefficients:%s, intercept %.2f'%(regr.coef_,regr.intercept_))
    print("Residual sum of squares: %.2f"% np.mean((regr.predict(X_test) - y_test) ** 2))
    print('Score: %.2f' % regr.score(X_test, y_test))

if __name__=='__main__':
    X_train,X_test,y_train,y_test=load_data() 
    test_LinearRegression(X_train,X_test,y_train,y_test) 
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model,model_selection

def load_data():
    diabetes = datasets.load_diabetes()
    return model_selection.train_test_split(diabetes.data,diabetes.target,
        test_size=0.25,random_state=0) 
def test_Lasso(*data):
    X_train,X_test,y_train,y_test=data
    regr = linear_model.Lasso()
    regr.fit(X_train, y_train)
    print('Coefficients:%s, intercept %.2f'%(regr.coef_,regr.intercept_))
    print("Residual sum of squares: %.2f"% np.mean((regr.predict(X_test) - y_test) ** 2))
    print('Score: %.2f' % regr.score(X_test, y_test))
    
def test_Lasso_alpha(*data):
    X_train,X_test,y_train,y_test=data
    alphas=[0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000]
    scores=[]
    for i,alpha in enumerate(alphas):
        regr = linear_model.Lasso(alpha=alpha)
        regr.fit(X_train, y_train)
        scores.append(regr.score(X_test, y_test))
    
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(alphas,scores)
    ax.set_xlabel(r"$\alpha$")
    ax.set_ylabel(r"score")
    ax.set_xscale('log')
    ax.set_title("Lasso")
    plt.show()
    
if __name__=='__main__':
    X_train,X_test,y_train,y_test=load_data() 
    test_Lasso(X_train,X_test,y_train,y_test) 
    test_Lasso_alpha(X_train,X_test,y_train,y_test) 
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model,model_selection

def load_data():
    diabetes = datasets.load_diabetes()
    return model_selection.train_test_split(diabetes.data,diabetes.target,
        test_size=0.25,random_state=0) 

def test_Ridge(*data):
    X_train,X_test,y_train,y_test=data
    regr = linear_model.Ridge()
    regr.fit(X_train, y_train)
    print('Coefficients:%s, intercept %.2f'%(regr.coef_,regr.intercept_))
    print("Residual sum of squares: %.2f"% np.mean((regr.predict(X_test) - y_test) ** 2))
    print('Score: %.2f' % regr.score(X_test, y_test))
    
def test_Ridge_alpha(*data):
    X_train,X_test,y_train,y_test=data
    alphas=[0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000]
    scores=[]
    for i,alpha in enumerate(alphas):
        regr = linear_model.Ridge(alpha=alpha)
        regr.fit(X_train, y_train)
        scores.append(regr.score(X_test, y_test))
    
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(alphas,scores)
    ax.set_xlabel(r"$\alpha$")
    ax.set_ylabel(r"score")
    ax.set_xscale('log')
    ax.set_title("Ridge")
    plt.show()
    
if __name__=='__main__':
    X_train,X_test,y_train,y_test=load_data() 
    test_Ridge(X_train,X_test,y_train,y_test) 
    test_Ridge_alpha(X_train,X_test,y_train,y_test) 

2. 邏輯斯諦迴歸

2.1 sigmoid函數與二分類邏輯斯諦迴歸模型

sigmoid函數:
sigmoid(z)=σ(z)=11+ezsigmoid\left(z\right)=\sigma\left(z\right)=\frac{1}{1+e^{-z}}
其中,zRz\in\mathbb{R}sigoid(z)(0,1)sigoid\left(z\right)\in\left(0,1\right)

在這裏插入圖片描述

sigmoid函數的導數:
σ(z)=σ(z)(1σ(z))\sigma'\left(z\right)=\sigma\left(z\right)\left(1-\sigma\left(z\right)\right)

二分類邏輯斯諦迴歸模型是如下的條件概率分佈:
P(y=1x)=σ(wx+b)=11+exp((wx+b))=exp(wx+b)1+exp(wx+b)P(y=0x)=1σ(wx+b)=11+exp(wx+b)\begin{aligned} P \left( y = 1 | \mathbf{x} \right) &=\sigma\left(\mathbf{w}\cdot\mathbf{x}+b\right) \\ &= \dfrac{1}{1+\exp{\left(-\left(\mathbf{w} \cdot \mathbf{x} + b \right)\right)}} \\ &= \dfrac{\exp{\left(\mathbf{w} \cdot \mathbf{x} + b \right)}}{1+\exp{\left( \mathbf{w} \cdot \mathbf{x} + b \right)}}\\ P \left( y = 0 | \mathbf{x} \right) &= 1- \sigma\left(\mathbf{w}\cdot\mathbf{x}+b\right) \\ &=\dfrac{1}{1+\exp{\left( \mathbf{w} \cdot \mathbf{x} + b \right)}}\end{aligned}
其中,xRn\mathbf{x} \in \mathbb{R}^{n}y{0,1}y \in \left\{ 0, 1 \right\}wRn\mathbf{w} \in \mathbb{R}^{n}是權值向量,bRb \in \mathbb{R}是偏置,wx\mathbf{w} \cdot \mathbf{x}爲向量內積。

可將權值權值向量和特徵向量加以擴充,即增廣權值向量w^=(w(1),w(2),,w(n),b)\hat{\mathbf{w}} = \left( w^{\left(1\right)},w^{\left(2\right)},\cdots,w^{\left(n\right)},b \right)^\top,增廣特徵向量x^=(x(1),x(2),,x(n),1)\hat{\mathbf{x}} = \left( x^{\left(1\right)},x^{\left(2\right)},\cdots,x^{\left(n\right)},1 \right)^\top,則邏輯斯諦迴歸模型:
P(y=1x^)=exp(w^x^)1+exp(w^x^)P(y=0x^)=11+exp(w^x^)\begin{aligned} \\& P \left( y = 1 | \hat{\mathbf{x}} \right) = \dfrac{\exp{\left(\hat{\mathbf{w}} \cdot \hat{\mathbf{x}} \right)}}{1+\exp{\left( \hat{\mathbf{w}} \cdot \hat{\mathbf{x}} \right)}}\\& P \left( y = 0 | \hat{\mathbf{x}} \right) =\dfrac{1}{1+\exp{\left( \hat{\mathbf{w}} \cdot \hat{\mathbf{x}} \right)}}\end{aligned}

2.2 二分類邏輯斯諦迴歸參數學習——最大似然估計

給定訓練數據集
D={(x^1,y1),(x^2,y2),,(x^N,yN)}\begin{aligned} \\& D = \left\{ \left( \hat{\mathbf{x}}_{1}, y_{1} \right), \left( \hat{\mathbf{x}}_{2}, y_{2} \right), \cdots, \left( \hat{\mathbf{x}}_{N}, y_{N} \right) \right\} \end{aligned}
其中,x^iRn+1,yi{0,1},i=1,2,,N\hat{\mathbf{x}}_{i} \in \mathbb{R}^{n+1}, y_{i} \in \left\{ 0, 1 \right\}, i = 1, 2, \cdots, N


P(y=1x^)=σ(w^x^),P(y=0x^)=1σ(w^x^)\begin{aligned} \\& P \left( y =1 | \hat{\mathbf{x}} \right) =\sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}} \right) ,\quad P \left( y =0 | \hat{\mathbf{x}} \right) = 1 - \sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}} \right) \end{aligned}
似然函數
L(w^)=i=1NP(yix^i)=i=1N[σ(w^x^i)]yi[1σ(w^x^i)]1yi \begin{aligned} L \left( \hat{\mathbf{w}} \right) &= \prod_{i=1}^N P\left(y_i|\hat{\mathbf{x}}_i\right) \\ &= \prod_{i=1}^{N} \left[ \sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_{i} \right) \right]^{y_{i}}\left[ 1 - \sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_{i} \right) \right]^{1 - y_{i}}\end{aligned}

因爲似然函數累乘會可能出現下溢的情況,可以轉換爲對數似然函數(累加)
l(w^)=logL(w^)=i=1N[yilogσ(w^x^i)+(1yi)log(1σ(w^x^i))]\begin{aligned} \\ l \left( \hat{\mathbf{w}} \right) &= \log L \left( \hat{\mathbf{w}} \right) \\ & = \sum_{i=1}^{N} \left[ y_{i} \log \sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_{i} \right) + \left( 1 - y_{i} \right) \log \left( 1 - \sigma \left( \hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_{i} \right) \right) \right]\end{aligned}

最大似然估計
w^=argmaxw^l(w^)\hat{\mathbf{w}}^*=\mathop{\arg\max}_{\hat{\mathbf{w}}} l\left(\hat{\mathbf{w}}\right)

最小負對數損失
w^=argminw^l(w^)\hat{\mathbf{w}}^*=\mathop{\arg\min}_{\hat{\mathbf{w}}}-l\left(\hat{\mathbf{w}}\right)

y^i=σ(w^x^i)\hat{y}_i=\sigma\left(\hat{\mathbf{w}}\cdot\hat{\mathbf{x}}_i\right),則對數似然函數l(w^)l\left(\hat{\mathbf{w}}\right)關於w^\hat{\mathbf{w}}的偏導數
l(w^)w^=i=1N(yiy^i(1y^i)y^ix^i(1yi)y^i(1y^i)1y^ix^i)=i=1N(yi(1y^i)x^i(1yi)y^ix^i)=i=1Nx^i(yiy^i)\begin{aligned}\frac{\partial l\left(\hat{\mathbf{w}}\right)}{\partial \hat{\mathbf{w}}} &=-\sum_{i=1}^N\left(y_i\frac{\hat{y}_i\left(1-\hat{y}_i\right)}{\hat{y}_i}\hat{\mathbf{x}}_i-\left(1-y_i\right)\frac{\hat{y}_i\left(1-\hat{y}_i\right)}{1-\hat{y}_i}\hat{\mathbf{x}}_i\right)\\ &=-\sum_{i=1}^N\left(y_i\left(1-\hat{y}_i\right)\hat{\mathbf{x}}_i-\left(1-y_i\right)\hat{y}_i\hat{\mathbf{x}}_i\right) \\ &=-\sum_{i=1}^N\hat{\mathbf{x}}_i\left(y_i-\hat{y}_i\right)\end{aligned}

採用梯度下降法,初始化w^=0\hat{\mathbf{w}}=\mathbf{0},進行迭代
w^t+1w^t+αi=1Nx^i(yiy^iw^t)\hat{\mathbf{w}}_{t+1}\gets\hat{\mathbf{w}}_t+\alpha\sum_{i=1}^N\hat{\mathbf{x}}_i\left(y_i-\hat{y}_i^{\hat{\mathbf{w}}_t}\right)
其中,α\alpha是學習率,y^iw^t\hat{y}_i^{\hat{\mathbf{w}}_t}是當參數w^t\hat{\mathbf{w}}_t時模型的預測輸出。

2.3 邏輯斯諦迴歸模型應用

from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, [0,1,-1]])
    return data[:, :2], data[:, -1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
class LogisticRegressionClassifier:
    
    def __init__(self, max_iter=200, learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate
        
    def sigmoid(self, x):
        return 1 / (1 + exp(-x))
    
    def data_matrix(self, X):
        data_mat = []
        for d in X:
            data_mat.append([1.0, *d])
        return data_mat
    
    def fit(self, X, y):
        data_mat = self.data_matrix(X)
        self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)
        
        for iter_ in range(self.max_iter):
            for i in range(len(X)):
                result = self.sigmoid(np.dot(data_mat[i], self.weights))
                error = y[i] - result
                self.weights += self.learning_rate * error * np.transpose([data_mat[i]])
        print('LogisticRegression Model(learning_rate={}, max_iter={})'.
              format(self.learning_rate, self.max_iter))
    
    def score(self, X_test, y_test):
        right = 0
        X_test = self.data_matrix(X_test)
        for x, y in zip(X_test, y_test):
            result = np.dot(x, self.weights)
            if (result > 0 and y == 1) or (result < 0 and y == 0):
                right += 1
        return right / len(X_test)
lr_clf = LogisticRegressionClassifier()
lr_clf.fit(X_train, y_train)
lr_clf.score(X_test, y_test)
x_points = np.arange(4, 8)
y_ = -(lr_clf.weights[1] * x_points + lr_clf.weights[0]) / lr_clf.weights[2]

plt.plot(x_points, y_)
plt.scatter(X[:50, 0], X[:50, 1], label='0')
plt.scatter(X[50:, 0], X[50:, 1], label='1')
plt.legend()

在這裏插入圖片描述

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn import model_selection

def load_data():
    iris=datasets.load_iris() 
    X_train=iris.data
    y_train=iris.target
    return model_selection.train_test_split(X_train, y_train,test_size=0.25,random_state=0,stratify=y_train)

def test_LogisticRegression(*data):
    X_train,X_test,y_train,y_test=data
    regr = linear_model.LogisticRegression()
    regr.fit(X_train, y_train)
    print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_))
    print('Score: %.2f' % regr.score(X_test, y_test))
    
def test_LogisticRegression_multinomial(*data):
    X_train,X_test,y_train,y_test=data
    regr = linear_model.LogisticRegression(multi_class='multinomial',solver='lbfgs')
    regr.fit(X_train, y_train)
    print('Coefficients:%s, intercept %s'%(regr.coef_,regr.intercept_))
    print('Score: %.2f' % regr.score(X_test, y_test))
    
def test_LogisticRegression_C(*data):
    X_train,X_test,y_train,y_test=data
    Cs=np.logspace(-2,4,num=100)
    scores=[]
    for C in Cs:
        regr = linear_model.LogisticRegression(C=C)
        regr.fit(X_train, y_train)
        scores.append(regr.score(X_test, y_test))
    
    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)
    ax.plot(Cs,scores)
    ax.set_xlabel(r"C")
    ax.set_ylabel(r"score")
    ax.set_xscale('log')
    ax.set_title("LogisticRegression")
    plt.show()

if __name__=='__main__':
    X_train,X_test,y_train,y_test=load_data() 
    test_LogisticRegression(X_train,X_test,y_train,y_test) 
    test_LogisticRegression_multinomial(X_train,X_test,y_train,y_test) 
    test_LogisticRegression_C(X_train,X_test,y_train,y_test) 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章