1. 梯度下降法簡介

特點

不是機器學習算法
是一種基於搜索的最優化方法
作用:最小化一個損失函數
梯度上升法:最大化一個效用函數

圖像直觀理解

假設只有一個參數,圖像如下:

橫軸:參數值;縱軸:損失函數值

導數(一維)可以代表方向,對應J增大的方向,所以想要找到最小值:

1. 方向:嚮導數的負方向調整參數值,

2. 步長:乘以學習率 $\eta$

爲什麼叫梯度?

導數是指只有一個變量, 而對於多元的情況,將損失函數對每一個變量求偏導數, 將每一個偏導數方向組成一個向量稱爲梯度

問題:

$\eta$ 太大可能導致模型不收斂
並不是所有函數都有唯一的極值點(這種搜索的方法可能找不到全局最優解, 所以起始點很重要)

解決方案:

選擇合適的學習率
多次運行,隨機化初始點
梯度下降的初始點也是一個超參數

線性迴歸法的損失函數具有唯一的最優解,所以使用梯度下降得到的最優解就是全局最優解

2. 模擬實現梯度下降法

import numpy as np
import matplotlib.pyplot as plt

plot_x = np.linspace(-1, 6, 141)
print(plot_x)
plot_y = (plot_x - 2.5) ** 2 - 1


def dJ(theta):
    return 2 * (theta - 2.5)

def J(theta):
    #
    try:
        return (theta - 2.5) ** 2 - 1.
    except:
        return float('inf')

eta = 1.1
theta_history = []
initial_theta = 0.0
def gradient_descent(initial_theta, eta, n_iters = 10, epsilon=1e-8):
    theta = initial_theta
    theta_history.append(initial_theta)
    i_iter = 0

    while i_iter < n_iters:
        gradient = dJ(theta)
        last_theta = theta
        theta = theta - eta * gradient
        theta_history.append(theta)
        print(J(theta) - J(last_theta))
        if(abs(J(theta) - J(last_theta)) < epsilon):# inf - inf = nan
            break
        i_iter += 1
    print("+++++++++", i_iter)

def plot_theta_history():
    plt.plot(plot_x, J(plot_x))
    plt.plot(np.array(theta_history), J(np.array(theta_history)), color = "r", marker = "+")
    plt.show()


gradient_descent(initial_theta, eta)
plot_theta_history()

eta = 0.1時,梯度下降的路徑如下圖:

如果學習率eta過大,不收斂(設置了迭代次數爲10,藍色的線是損失函數的圖像, 紅色的線是梯度下降過程中theta的取值)

6-1 梯度下降法

1. 梯度下降法簡介

特點

圖像直觀理解

爲什麼叫梯度?

2. 模擬實現梯度下降法

藍橋15屆stema編程題密碼鎖-動態規劃 C++和Python最後一道題

2021看雪SDC議題回顧 | SaTC：一種全新的物聯網設備漏洞自動化挖掘方法

C# 代碼學習

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

一個簡單的MD5加鹽

HTTP URL 詳解

得物 ZooKeeper SLA 也可以 99.99%

創新工具：2024年開發者必備的一款表格控件（二）

5-7 多元線性迴歸

shellnet安裝記錄

海量數據查詢問題

幾種可分卷積

劍指offer中要記住的算法思想

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結