python迴歸分析總結--線性模型及嶺迴歸

1、迴歸分析概括

目標值(因變量)是連續型數據,通過某種函數關係找到因變量和自變量之間的關係,進而預測目標。

常見的迴歸:線性迴歸、嶺迴歸、非線性迴歸

迴歸擬合目標:計算自變量與因變量關係的函數參數

通過不斷擬合縮小預測值與真實值的差距:最終使得這個差距(誤差項)成爲一組均值爲0,方差爲1的隨機數。

2、損失函數

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-egZ0SZ3o-1582607435689)(attachment:image.png)]

3、優化算法

使得損失函數值達到最小的方法。

方法:

正規方程
梯度下降

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-7iy2oiir-1582607435691)(attachment:image.png)]

4、python的API

4.1.1 statsmodels.formula.api.OLS():普通最小二乘模型擬合- - 常用
4.1.2 scipy.stats.linregress(): 線性擬合
4.1.3 scipy.optimize.curve_fit():迴歸函數擬合
使用參考:https://blog.csdn.net/weixin_41685388/article/details/104346268

5、python機器學習線性模型API

1、·sklearn.linear_model.LinearRegressiont(fit_intercept=True)

。通過正規方程優化
。fitintercept:是否計算偏置
。LinearRegression.coef_:返回迴歸係數
。LinearRegression.intercept_:返回偏置

2、·sklearn.linear model.SGDRegressor(loss=“squared loss”,fit intercept=True,learning_rate =‘invscaling’,eta0=0.01)

。SGDRegressor類實現了隨機梯度下降學習優化,它支持不同的loss函數和正則化悉罰項來擬合線性迴歸模型。
。loss:損失類型
    ·loss="squared_loss”:普通最小二乘法
。fit_intercept:是否計算偏置 True/False
。eta0=0.01  :起始設定學習率
。learning_rate: 迭代過程中學習率的計算方式
    · 學習率填充
    ·"constant":eta=eta0
    ·"optimal":eta=1.0/(alpha*(t+to))[default]
    ·"invscaling":eta=eta0/pow(t,power_t),
        ·power_t=0.25:存在父類當中
    ·對於一個常數值的學習率來說,可以使用learning_rate='constant',並使用eta0來指定學習率。
。SGDRegressor.coef_:返回迴歸係數
。SGDRegressor.intercept_:返回偏置

6、機器學習中迴歸性能評估

均方誤差越小,模型相對越好。
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ggcmpEBL-1582607435692)(attachment:image.png)]

7、欠擬合和過擬合

欠擬合:模型擬合欠佳,多加些數據及特徵變量

過擬合:模型過於複雜,訓練集上準確率很高,一拿到測試集上效果就不理想了。

。原因:原始特徵過多,存在一些嘈雜特徵,模型過於複雜是因爲模型嘗試去兼顧各個測試數據點
。線性模型解決辦法:
    ·正則化:L1正則化,L2正則化

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-cfLrDwQy-1582607435693)(attachment:image.png)]

8、線性迴歸的改進–嶺迴歸

嶺迴歸,其實也是一種線性迴歸。只不過在篩法建立迴歸方程時候,加上L2正則化的限制,從而達到解決過擬合的效果。

API: ·sklearn.linear_model.Ridge(alpha=1.0,fit_intercept=True,solver=“auto”,nomalize=False)

。具有L2正則化的線性迴歸
。alpha:正則化力度,懲罰項係數 入。正則化力度越大,權重系致會越小,正則化力度越小,權重係數會越大。
    ·取值:0~1,1~10
。solver:會根據數據自動選擇優化方法
    ·sag:如果數據集、特徵都比較大,選擇該隨機梯度下降優化
。normalize:數據是否進行標準化
    ·normalize=False:可以在fit之前調用preprocessing.StandardScaler標準化數據,自動對數據進行標準化
。Ridge.coef-:迴歸權重
。Ridge.intercept_:迴歸偏置

9、案例代碼

from sklearn.datasets import load_boston  #sklearn波士頓房價預測數據接口
from sklearn.model_selection import train_test_split  #劃分數據集
from sklearn.preprocessing import StandardScaler    #數據標準化
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge  #預估器(正規方程)、預估器(梯度下降學習)、嶺迴歸
from sklearn.metrics import mean_squared_error  #均方誤
from sklearn.externals import joblib   #模型的加載與保存


def linear1():
    """
    正規方程的優化方法對波士頓房價進行預測
    :return:
    """
    # 1)獲取數據
    boston = load_boston()#sklearn波士頓房價預測數據

    # 2)劃分數據集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=22)

    # 3)標準化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4)預估器
    estimator = LinearRegression()
    estimator.fit(x_train, y_train)

    # 5)得出模型
    print("正規方程-權重係數爲:\n", estimator.coef_)
    print("正規方程-偏置爲:\n", estimator.intercept_)

    # 6)模型評估
    y_predict = estimator.predict(x_test)
    print("預測房價:\n", y_predict)
    error = mean_squared_error(y_test, y_predict)
    print("正規方程-均方誤差爲:\n", error)

    return None


def linear2():
    """
    梯度下降的優化方法對波士頓房價進行預測
    :return:
    """
    # 1)獲取數據
    boston = load_boston()
    print("特徵數量:\n", boston.data.shape)

    # 2)劃分數據集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=22)

    # 3)標準化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4)預估器
    estimator = SGDRegressor(learning_rate="constant", eta0=0.01, max_iter=10000, penalty="l1")
    estimator.fit(x_train, y_train)

    # 5)得出模型
    print("梯度下降-權重係數爲:\n", estimator.coef_)
    print("梯度下降-偏置爲:\n", estimator.intercept_)

    # 6)模型評估
    y_predict = estimator.predict(x_test)
    print("預測房價:\n", y_predict)
    error = mean_squared_error(y_test, y_predict)
    print("梯度下降-均方誤差爲:\n", error)

    return None


def linear3():
    """
    嶺迴歸對波士頓房價進行預測
    :return:
    """
    # 1)獲取數據
    boston = load_boston()
    print("特徵數量:\n", boston.data.shape)

    # 2)劃分數據集
    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, random_state=22)

    # 3)標準化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 4)預估器
    estimator = Ridge(alpha=0.5, max_iter=10000)
    estimator.fit(x_train, y_train)

    # 保存模型
    joblib.dump(estimator, "my_ridge.pkl")
    
    # 加載模型 使用時註銷 4)預估器 和 保存模型
#     estimator = joblib.load("my_ridge.pkl")

    # 5)得出模型
    print("嶺迴歸-權重係數爲:\n", estimator.coef_)
    print("嶺迴歸-偏置爲:\n", estimator.intercept_)

    # 6)模型評估
    y_predict = estimator.predict(x_test)
    print("預測房價:\n", y_predict)
    error = mean_squared_error(y_test, y_predict)
    print("嶺迴歸-均方誤差爲:\n", error)

    return None

if __name__ == "__main__":
    # 代碼1:正規方程的優化方法對波士頓房價進行預測
    linear1()
    # 代碼2:梯度下降的優化方法對波士頓房價進行預測
    linear2()
    # 代碼3:嶺迴歸對波士頓房價進行預測
    linear3()

正規方程-權重係數爲:
[-0.63330277 1.14524456 -0.05645213 0.74282329 -1.95823403 2.70614818
-0.07544614 -3.29771933 2.49437742 -1.85578218 -1.7518438 0.8816005
-3.92011059]
正規方程-偏置爲:
22.62137203166228
預測房價:
[28.23494214 31.51307591 21.11158648 32.66626323 20.00183117 19.06699551
21.0961119 19.61374904 19.61770489 32.88592905 20.9786404 27.52841267
15.54828312 19.78740662 36.89507874 18.81564352 9.34846191 18.49591496
30.67162831 24.30515001 19.06869647 34.10872969 29.82133504 17.52652164
34.90809099 26.5518049 34.71029597 27.42733357 19.096319 14.92856162
30.86006302 15.8783044 37.1757242 7.80943257 16.23745554 17.17366271
7.46619503 20.00428873 40.58796715 28.93648294 25.25640752 17.73215197
38.74782311 6.87753104 21.79892653 25.2879307 20.43140241 20.47297067
17.25472052 26.14086662 8.47995047 27.51138229 30.58418801 16.57906517
9.35431527 35.54126306 32.29698317 21.81396457 17.60000884 22.07940501
23.49673392 24.10792657 20.13898247 38.52731389 24.58425972 19.7678374
13.90105731 6.77759905 42.04821253 21.92454718 16.8868124 22.58439325
40.75850574 21.40493055 36.89550591 27.19933607 20.98475235 20.35089273
25.35827725 22.19234062 31.13660054 20.39576992 23.99395511 31.54664956
26.74584297 20.89907127 29.08389387 21.98344006 26.29122253 20.1757307
25.49308523 24.08473351 19.89049624 16.50220723 15.21335458 18.38992582
24.83578855 16.59840245 20.88232963 26.7138003 20.75135414 17.87670216
24.2990126 23.37979066 21.6475525 36.8205059 15.86479489 21.42514368
32.81282808 33.74331087 20.62139404 26.88700445 22.65319133 17.34888735
21.67595777 21.65498295 27.66634446 25.05030923 23.74424639 14.65940118
15.19817822 3.8188746 29.18611337 20.67170992 22.3295488 28.01966146
28.59358258]
正規方程-均方誤差爲:
20.630254348291196
特徵數量:
(506, 13)

梯度下降-權重係數爲:
[-0.18032203 1.13156337 0. 0.53527539 -1.98019369 2.31577395
0. -3.16472059 2.5717045 -1.80879957 -1.47919941 0.84244604
-3.6687529 ]
梯度下降-偏置爲:
[22.93172969]
預測房價:
[27.90852137 30.73533552 21.37117576 31.30919582 20.39907435 20.23369773
21.44665125 20.06044518 19.95900751 31.96641058 21.26970571 28.06166368
16.50815474 20.25008639 35.45153789 19.60300052 10.21564488 19.17587013
29.84210678 24.26185529 20.09879751 32.77148848 28.67052105 19.46612627
33.75579242 26.39603334 33.80980471 26.95023173 20.77382329 15.43737743
29.79702013 16.73273481 35.59696223 13.63071412 17.02373162 18.52857957
11.49951798 21.15629995 38.19107541 28.27722831 25.17917274 18.94408519
36.95511701 9.72267754 22.79788979 25.25897212 19.86829513 20.88146848
17.43209885 27.34946393 9.83760308 27.2421653 29.39622146 18.91506767
11.56747614 34.13073256 31.84853002 21.65448141 18.10864237 22.30039092
23.61171872 24.28836203 20.5987606 36.94854241 24.17762474 20.82746777
15.44126342 10.28894764 39.4179656 22.14764366 18.63845997 22.62668211
38.63039022 21.61091557 35.28731204 27.02976161 20.9912817 20.71607518
25.08804361 21.70828681 30.30797708 20.65632595 23.38139007 30.5529622
26.40554838 21.98453023 28.63517581 22.0662799 26.17296457 20.70494038
25.44215375 23.52914762 20.78856897 22.41964858 16.62951378 19.4470062
24.76931554 18.28384239 21.85787669 26.53102025 21.90177616 19.20800037
24.29724135 23.54203217 22.04699928 35.16568535 16.71146682 21.38738079
31.88119591 33.21219461 20.98633043 26.89005626 21.63023398 17.87273172
21.95797063 21.65649016 27.17987661 24.78220419 23.85997069 16.48811907
18.34503687 7.31727711 28.70887538 21.47951722 22.45958368 27.62175362
28.23593409]
梯度下降-均方誤差爲:
24.848731722199563
特徵數量:
(506, 13)

-權重係數爲:
[-0.62710135 1.13221555 -0.07373898 0.74492864 -1.93983515 2.71141843
-0.07982198 -3.27753496 2.44876703 -1.81107644 -1.74796456 0.88083243
-3.91211699]
嶺迴歸-偏置爲:
22.62137203166228
預測房價:
[28.23082349 31.50636545 21.12739377 32.65793823 20.02076945 19.06632771
21.106687 19.61624365 19.63161548 32.86596512 20.9946695 27.50329913
15.55414648 19.79639417 36.88392371 18.80672342 9.38096 18.50907253
30.67484295 24.30753141 19.0666843 34.09564382 29.80095002 17.51949727
34.8916544 26.5394645 34.68264723 27.42856108 19.09405963 14.98997618
30.8505874 15.81996969 37.18247113 7.85916465 16.25653448 17.15490009
7.48867279 19.99147768 40.57329959 28.95128807 25.25723034 17.73738109
38.75700749 6.87711291 21.78043375 25.27159224 20.45456114 20.48220948
17.25258857 26.1375367 8.5448374 27.49204889 30.58183066 16.58438621
9.37182303 35.52269097 32.24958654 21.87431027 17.60876103 22.08124517
23.50114904 24.09591554 20.15605099 38.49857046 24.64026646 19.75933465
13.91713858 6.78030217 42.04984214 21.92558236 16.8702938 22.59592875
40.74980559 21.4284924 36.88064128 27.18855416 21.04326386 20.36536628
25.36109432 22.27869444 31.14592486 20.39487869 23.99757481 31.54428168
26.76210157 20.89486664 29.07215993 21.99603204 26.30599891 20.11183257
25.47912071 24.0792631 19.89111149 16.56247916 15.22770226 18.38342191
24.82070397 16.60156656 20.86675004 26.71162923 20.74443479 17.8825254
24.28515984 23.37007961 21.58413976 36.79386382 15.88357121 21.47915185
32.79931234 33.71603437 20.62134398 26.83678658 22.68850452 17.37312422
21.67296898 21.67559608 27.66601539 25.0712154 23.73692967 14.64799906
15.21577315 3.82030283 29.17847194 20.66853036 22.33184243 28.0180608
28.56771983]
嶺迴歸-均方誤差爲:
20.644810227653515

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章