Adversarial Robustness

Motivation: a limitation of the (supervised) ML framework\scriptsize{\text{: a limitation of the (supervised) ML framework}}

The current way of training and inference is to train the ML model on some randomly drawn samples and test on some other independently drawn samples. A crucial assumption behind this is that the distribution that we use to train the ML model is exactly the distribution that the model will encounter. In reality, this is not the case as there are various forms of covariate shifts. Due to lack of this assumption, ML (particularly DL) predictions are brittle despite (mostly) accurate.

ML and adversarially robust ML

learning objective

In fact, lack of adversarial robustness is not at odds with what we currently want our ML models to achieve.

  • standard generalization (average case performance): E(x,y)D[loss(θ,x,y)]\mathbb{E}_{(x,y)\sim D} [loss(\theta,x,y)]
  • adversarial robust generalization (worst-case notion, measure zero event): E(x,y)D[maxδΔloss(θ,x+δ,y)]\mathbb{E}_{(x,y)\sim D} [\max_{\delta \in \Delta} loss(\theta,x+\delta,y)]
generalization of standard and robust deep networks

Not only does the optimization problem become more challenging and the model become larger(training), adversarial robustness has a significantly larger sample complexity, thus its generalization requires more data (test).

No ‘free lunch’: can exist a trade-off between accuracy and robustness. Intuitively,

  • in standard training, all correlation among features is a good correlation (so-called non-robust features in [3]);
  • if we want robustness, must avoid weakly correlated features.
interpretability

Adversarial Examples and Verification (inner minimization)

minθ(x,y)SmaxδΔL(θ,x+δ,y)Part I. create AEs or verify it does not exist\min_\theta \sum_{(x,y)\sim S} \underbrace{\max_{\delta \in \Delta} L(\theta,x+\delta,y)}_\text{Part I. create AEs or verify it does not exist}

In the linear case with a norm ball perturbation region, the maximization has exact solution based on dual norm; a simple instance of robust optimization. That is,
maxδL(θT(x+δ)y)=L(minδθT(x+δ)y)=L(θTxyθ)\begin{aligned} \max_\delta L(\theta^T (x + \delta) \cdot y) &=L(\min_\delta \theta^T (x + \delta) \cdot y)\\ &=L(\theta^T x \cdot y - \| \theta \|_\ast) \end{aligned}

1. Local search

Various forms of gradient descent
  • projected gradient descent:
    δ:=PΔ[δ+αδL(x+δ,y;θ)]\delta:= \mathcal{P}_\Delta [\delta + \alpha \nabla_\delta L (x+\delta,y;\theta)]
  • fast gradient sign method (FGSM): take Δ\Delta to be the \ell_\infty ball, so projection takes the form PΔ(δ)=Clip(δ,[ϵ,ϵ])\mathcal{P}_\Delta(\delta)=\text{Clip}(\delta, [-\epsilon,\epsilon]). When α\alpha \rightarrow \infty,
    δ=ϵsign(δL(x+δ,y;θ))\delta= \epsilon \cdot \text{sign}(\nabla_\delta L (x+\delta,y;\theta))

projected gradient descent applied to \ell_\infty ball
δ:=Clipϵ[δ+αδL(x+δ,y;θ)]\delta:= \text{Clip}_\epsilon [\delta + \alpha \nabla_\delta L (x+\delta,y;\theta)]

  • (normalized) steepest descent
    Motivation: At the actual example, the gradient of the loss is often very small so you need to take a big step size to get to anywhere. Once you take a large step size and get out of that small region, it hits the corner and reduces to the FGSM.
    Method:
    δ:=PΔ[δ+αargmaxvrvTδJ(δ)]\delta:= \mathcal{P}_\Delta [\delta + \alpha \arg\max_{\|v\| \leq r} v^T \nabla_\delta J(\delta)]
    It is typical to choose inner norm the same as the Δ\Delta constraint.
    argmaxv2rvTδJ(δ)=rδJ(δ)δJ(δ)2\arg\max_{\|v\|_2 \leq r} v^T \nabla_\delta J(\delta) = r \frac{\nabla_\delta J(\delta)}{\| \nabla_\delta J(\delta)\|_2}. The direction corresponds to the 2\ell_2-normalized gradient.
    argmaxvrvTδJ(δ)=rsign(δJ(δ))\arg\max_{\|v\|_\infty \leq r} v^T \nabla_\delta J(\delta) = r \cdot \text{sign}( \nabla_\delta J(\delta)). The direction corresponds to many “mini-FGSM” steps.
Targeted attacks

maxδΔ[L(x+δ,y;θ)L(x+δ,ytarget;θ)]\max_{\delta \in \Delta}[L(x+\delta,y;\theta) - L(x+\delta,y_\text{target};\theta)]

Considering kk-class cross entropy loss
L(x+δ,y;θ)=logj=1kexp(hθ(x+δ)j)hθ(x)y,L(x+\delta,y;\theta)= \log \sum_{j=1}^k \exp(h_\theta(x+\delta)_j) - h_\theta(x)_y,

the above problem simplifies to
maxδΔ(hθ(x)ytargethθ(x)y).\max_{\delta \in \Delta} (h_\theta(x)_{y_\text{target}} - h_\theta(x)_y).

Remark: one way to categorize papers on adversarial attacks – describe the attack in terms of 1) the allowable perturbation set Δ\Delta; 2) the optimization procedure used to perform the maximization.

Training Adversarially Robust Models (outer minimization)

minθ(x,y)SmaxδΔL(θ,x+δ,y)Part II. training a robust classifier\underbrace{\min_\theta \sum_{(x,y)\sim S} \max_{\delta \in \Delta} L(\theta,x+\delta,y)}_\text{Part II. training a robust classifier}

1. Adversarial training

Danskin’s theorem\textcolor{green}{\text{Danskin's theorem}}:
θmaxδΔL(x+δ,y;θ)=θL(x+δ,y;θ),\nabla_\theta \max_{\delta \in \Delta} L(x+\delta,y;\theta) = \nabla_\theta L(x+\delta^\ast,y;\theta),
where δ=maxδΔL(x+δ,y;θ)\delta^\ast = \max_{\delta \in \Delta}L(x+\delta,y;\theta). Note that it only applies when max is performed exactly.

Repeat
\hspace{1em} 1. Select minibatch BB
\hspace{1em} 2. For each (x,y)B(x,y) \in B, compute δ(x)\delta^\ast(x)
\hspace{1em} 3. Update parameters θ:=θαBx,yBL(x+δ(x),y;θ)\theta:=\theta - \frac{\alpha}{|B|} \sum_{x,y \in B} L(x+\delta^\ast(x),y;\theta)
Common to also mix robust/standard updates

What makes the model robust? Standard training gives a very steep loss surface with a lot of directions for sharp increase; robust training gives a flat surface and hence less susceptible to adversarial attacks.


Current Challenges and Future Research

challenges
  • on large-scale problems, we are nowhere close to building robust models that can match standard models in terms of their performance.
  • large generalization gap of adversarial robust deep networks
future research
  • Algorithm: faster robust training and verification; smaller (more compact) models; new architectures to induce more robustness in the prior
  • Theory: (better) adversarial robust generalization bounds; theories of which can guide new regularization techniques
  • Data: more comprehensive set of perturbations

Related Topics

  • data poisoning: training phase (linked with traditional robust statistics);
    adversarial examples: inference/test phase
  • data augmentation
    adversarial training = an “ultimate” version of data augmentation
  • robustness and model interpretability, fairness
  • universal perturbation/input-free adversarial examples (studied) and robust optimization against universal perturbation (less studied)

參考文獻:

  1. J. Z. Kolter and A. Madry: Adversarial Robustness - Theory and Practice (NeurIPS 2018 Tutorial)
    https://www.youtube.com/watch?v=TwP-gKBQyic&t=4467s
  2. https://adversarial-ml-tutorial.org/adversarial_examples/
  3. Ilyas, Andrew, et al. “Adversarial examples are not bugs, they are features.” Advances in Neural Information Processing Systems. 2019.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章