台部落大眼呆萌君

題目（155）：當訓練數據量特別大時，經典的梯度下降法存在什麼問題，需要做如何改進？題目（158）：隨機梯度下降法失效的原因。題目（160）：爲了改進隨機梯度下降法，研究者都做了哪些改動？提出了哪些變種方法？它們各有哪些特點？

2020-07-04 19:24:32

題目（145）：機器學習中的優化問題，哪些是凸優化問題，哪些是非凸優化問題？請各舉一個例子。凸優化定義凸優化問題非凸優化問題凸優化定義：公式、geometric insight 凸優化問題：邏輯迴歸

2020-07-04 18:42:44

題目（164）：L1正則化使得模型參數具有稀疏性的原理是什麼？回答角度：幾何角度，即解空間形狀微積分角度，對帶L1限制的目標函數求導貝葉斯先驗解空間形狀 Step 1. 正則條件和限制條件的等價性 Step 2. L

2020-07-04 18:42:44

題目（152）：如何驗證求目標函數梯度功能的正確性？考點：微積分、Taylor expansion 近似（微積分）根據partial derivative的定義， ∂L(θ)∂θi=L(θ1,⋯ ,θi+h,⋯ ,θp)−L(

2020-07-04 18:42:44

Epoch One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE [1

2020-07-04 18:42:44

重點 Armijo condition的直觀理解背景: In gradient descent algorithms, step size may be too large or too small, as shown in

2020-07-04 18:42:44

題目（148）：無約束優化問題的優化方法有哪些？複習點：一階、二階算法和Taylor expansion之間的關係直接求解迭代求解一階算法二階算法直接求解 convex objective function

2020-07-04 18:42:44

排版加入空白行 [2] <br> （加在段末而非段首）；或者<br/>加回車文字居中 [3] <center>文字</center> 圖片尺寸 <img src = "https://....png" width="10%

2020-07-04 18:42:44

Notations input vvv output rrr weight parameter W∈Rd×mW \in \mathbb{R}^{d \times m}W∈Rd×m activation function aaa

2020-07-04 18:42:44

Weakness of adversarial training: overfit to the attack in use and hence does not generalize to test data Curriculu

2020-07-04 18:42:44

Learning rate 最優值從1e-4到1e-1的數量級都碰到過，原則大概是越簡單的模型的learning rate可以越大一些。 [https://blog.csdn.net/weixin_44070747/article

2020-04-15 14:04:39

Group lasso β^λ=arg⁡min⁡β∥Y−Xβ∥22+λ∑g=1G∥βIg∥2,\hat{\bm \beta}_\lambda = \arg \min_{\bm \beta} \| \bm Y - \bm X \bm

2020-03-02 09:50:20

Derivation (method of Lagrangian multiplier) Derivation First step: Find αk′x\bm \alpha'_k \bm xαk′x that maxim

2020-02-23 11:19:05

Big picture on why we need randomness in stochastic algorithms randomness during initialization: as the structure

2020-02-23 11:19:05

Motivation: a limitation of the (supervised) ML framework\scriptsize{\text{: a limitation of the (supervised) ML fr

2020-02-23 11:19:05