Adversarial Robustness

Motivation $\scriptsize{\text{: a limitation of the (supervised) ML framework}}$

The current way of training and inference is to train the ML model on some randomly drawn samples and test on some other independently drawn samples. A crucial assumption behind this is that the distribution that we use to train the ML model is exactly the distribution that the model will encounter. In reality, this is not the case as there are various forms of covariate shifts. Due to lack of this assumption, ML (particularly DL) predictions are brittle despite (mostly) accurate.

ML and adversarially robust ML

learning objective

In fact, lack of adversarial robustness is not at odds with what we currently want our ML models to achieve.

standard generalization (average case performance): $\mathbb{E}_{(x,y)\sim D} [loss(\theta,x,y)]$
adversarial robust generalization (worst-case notion, measure zero event): $\mathbb{E}_{(x,y)\sim D} [\max_{\delta \in \Delta} loss(\theta,x+\delta,y)]$

generalization of standard and robust deep networks

Not only does the optimization problem become more challenging and the model become larger(training), adversarial robustness has a significantly larger sample complexity, thus its generalization requires more data (test).

No ‘free lunch’: can exist a trade-off between accuracy and robustness. Intuitively,

in standard training, all correlation among features is a good correlation (so-called non-robust features in [3]);
if we want robustness, must avoid weakly correlated features.

interpretability

Adversarial Examples and Verification (inner minimization)

$\min_\theta \sum_{(x,y)\sim S} \underbrace{\max_{\delta \in \Delta} L(\theta,x+\delta,y)}_\text{Part I. create AEs or verify it does not exist}$

In the linear case with a norm ball perturbation region, the maximization has exact solution based on dual norm; a simple instance of robust optimization. That is,
$\begin{aligned} \max_\delta L(\theta^T (x + \delta) \cdot y) &=L(\min_\delta \theta^T (x + \delta) \cdot y)\\ &=L(\theta^T x \cdot y - \| \theta \|_\ast) \end{aligned}$

1. Local search

Various forms of gradient descent

projected gradient descent:
$\delta:= \mathcal{P}_\Delta [\delta + \alpha \nabla_\delta L (x+\delta,y;\theta)]$

fast gradient sign method (FGSM): take $\Delta$ to be the $\ell_\infty$ ball, so projection takes the form $\mathcal{P}_\Delta(\delta)=\text{Clip}(\delta, [-\epsilon,\epsilon])$ . When $\alpha \rightarrow \infty$ ,
$\delta= \epsilon \cdot \text{sign}(\nabla_\delta L (x+\delta,y;\theta))$

projected gradient descent applied to $\ell_\infty$ ball
$\delta:= \text{Clip}_\epsilon [\delta + \alpha \nabla_\delta L (x+\delta,y;\theta)]$

(normalized) steepest descent
Motivation: At the actual example, the gradient of the loss is often very small so you need to take a big step size to get to anywhere. Once you take a large step size and get out of that small region, it hits the corner and reduces to the FGSM.
Method:
$\delta:= \mathcal{P}_\Delta [\delta + \alpha \arg\max_{\|v\| \leq r} v^T \nabla_\delta J(\delta)]$
It is typical to choose inner norm the same as the $\Delta$ constraint.
$\arg\max_{\|v\|_2 \leq r} v^T \nabla_\delta J(\delta) = r \frac{\nabla_\delta J(\delta)}{\| \nabla_\delta J(\delta)\|_2}$ . The direction corresponds to the $\ell_2$ -normalized gradient.
$\arg\max_{\|v\|_\infty \leq r} v^T \nabla_\delta J(\delta) = r \cdot \text{sign}( \nabla_\delta J(\delta))$ . The direction corresponds to many “mini-FGSM” steps.

Targeted attacks

$\max_{\delta \in \Delta}[L(x+\delta,y;\theta) - L(x+\delta,y_\text{target};\theta)]$

Considering $k$ -class cross entropy loss
$L(x+\delta,y;\theta)= \log \sum_{j=1}^k \exp(h_\theta(x+\delta)_j) - h_\theta(x)_y,$

the above problem simplifies to
$\max_{\delta \in \Delta} (h_\theta(x)_{y_\text{target}} - h_\theta(x)_y).$

Remark: one way to categorize papers on adversarial attacks – describe the attack in terms of 1) the allowable perturbation set $\Delta$ ; 2) the optimization procedure used to perform the maximization.

Training Adversarially Robust Models (outer minimization)

$\underbrace{\min_\theta \sum_{(x,y)\sim S} \max_{\delta \in \Delta} L(\theta,x+\delta,y)}_\text{Part II. training a robust classifier}$

1. Adversarial training

$\textcolor{green}{\text{Danskin's theorem}}$ :
$\nabla_\theta \max_{\delta \in \Delta} L(x+\delta,y;\theta) = \nabla_\theta L(x+\delta^\ast,y;\theta),$
where $\delta^\ast = \max_{\delta \in \Delta}L(x+\delta,y;\theta)$ . Note that it only applies when max is performed exactly.

Repeat
$\hspace{1em}$ 1. Select minibatch $B$
$\hspace{1em}$ 2. For each $(x,y) \in B$ , compute $\delta^\ast(x)$
$\hspace{1em}$ 3. Update parameters $\theta:=\theta - \frac{\alpha}{|B|} \sum_{x,y \in B} L(x+\delta^\ast(x),y;\theta)$
Common to also mix robust/standard updates

What makes the model robust? Standard training gives a very steep loss surface with a lot of directions for sharp increase; robust training gives a flat surface and hence less susceptible to adversarial attacks.

Current Challenges and Future Research

challenges

on large-scale problems, we are nowhere close to building robust models that can match standard models in terms of their performance.
large generalization gap of adversarial robust deep networks

future research

Algorithm: faster robust training and verification; smaller (more compact) models; new architectures to induce more robustness in the prior
Theory: (better) adversarial robust generalization bounds; theories of which can guide new regularization techniques
Data: more comprehensive set of perturbations

Adversarial Robustness

Motivation $\scriptsize{\text{: a limitation of the (supervised) ML framework}}$

ML and adversarially robust ML

learning objective

generalization of standard and robust deep networks

interpretability

Adversarial Examples and Verification (inner minimization)

1. Local search

Various forms of gradient descent

Targeted attacks

Training Adversarially Robust Models (outer minimization)

1. Adversarial training

Current Challenges and Future Research

challenges

future research

Related Topics

容器中nginx無法使用同一個網絡下的容器域名

Python: SunMoonTimeCalculator

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

NETCore中實現一個輕量無負擔的極簡任務調度ScheduleTask

docker使用特定的網絡

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

nodejs學習07——API

避免DbContext同時在多個線程調用

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

梯度下降、隨機梯度下降法、及其改進

機器學習中的凸和非凸優化問題

L1正則項與稀疏性

驗證梯度的正確性

Deep Learning相關概念

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Adversarial Robustness

Motivation: a limitation of the (supervised) ML framework\scriptsize{\text{: a limitation of the (supervised) ML framework}}: a limitation of the (supervised) ML framework

ML and adversarially robust ML

learning objective

generalization of standard and robust deep networks

interpretability

Adversarial Examples and Verification (inner minimization)

1. Local search

Various forms of gradient descent

Targeted attacks

Training Adversarially Robust Models (outer minimization)

1. Adversarial training

Current Challenges and Future Research

challenges

future research

Related Topics

Motivation $\scriptsize{\text{: a limitation of the (supervised) ML framework}}$