Motivation
The current way of training and inference is to train the ML model on some randomly drawn samples and test on some other independently drawn samples. A crucial assumption behind this is that the distribution that we use to train the ML model is exactly the distribution that the model will encounter. In reality, this is not the case as there are various forms of covariate shifts. Due to lack of this assumption, ML (particularly DL) predictions are brittle despite (mostly) accurate.
ML and adversarially robust ML
learning objective
In fact, lack of adversarial robustness is not at odds with what we currently want our ML models to achieve.
- standard generalization (average case performance):
- adversarial robust generalization (worst-case notion, measure zero event):
generalization of standard and robust deep networks
Not only does the optimization problem become more challenging and the model become larger(training), adversarial robustness has a significantly larger sample complexity, thus its generalization requires more data (test).
No ‘free lunch’: can exist a trade-off between accuracy and robustness. Intuitively,
- in standard training, all correlation among features is a good correlation (so-called non-robust features in [3]);
- if we want robustness, must avoid weakly correlated features.
interpretability
Adversarial Examples and Verification (inner minimization)
In the linear case with a norm ball perturbation region, the maximization has exact solution based on dual norm; a simple instance of robust optimization. That is,
1. Local search
Various forms of gradient descent
- projected gradient descent:
- fast gradient sign method (FGSM): take to be the ball, so projection takes the form . When ,
projected gradient descent applied to ball
- (normalized) steepest descent
Motivation: At the actual example, the gradient of the loss is often very small so you need to take a big step size to get to anywhere. Once you take a large step size and get out of that small region, it hits the corner and reduces to the FGSM.
Method:
It is typical to choose inner norm the same as the constraint.
. The direction corresponds to the -normalized gradient.
. The direction corresponds to many “mini-FGSM” steps.
Targeted attacks
Considering -class cross entropy loss
the above problem simplifies to
Remark: one way to categorize papers on adversarial attacks – describe the attack in terms of 1) the allowable perturbation set ; 2) the optimization procedure used to perform the maximization.
Training Adversarially Robust Models (outer minimization)
1. Adversarial training
:
where . Note that it only applies when max is performed exactly.
Repeat
1. Select minibatch
2. For each , compute
3. Update parameters
Common to also mix robust/standard updates
What makes the model robust? Standard training gives a very steep loss surface with a lot of directions for sharp increase; robust training gives a flat surface and hence less susceptible to adversarial attacks.
Current Challenges and Future Research
challenges
- on large-scale problems, we are nowhere close to building robust models that can match standard models in terms of their performance.
- large generalization gap of adversarial robust deep networks
future research
- Algorithm: faster robust training and verification; smaller (more compact) models; new architectures to induce more robustness in the prior
- Theory: (better) adversarial robust generalization bounds; theories of which can guide new regularization techniques
- Data: more comprehensive set of perturbations
Related Topics
- data poisoning: training phase (linked with traditional robust statistics);
adversarial examples: inference/test phase - data augmentation
adversarial training = an “ultimate” version of data augmentation - robustness and model interpretability, fairness
- universal perturbation/input-free adversarial examples (studied) and robust optimization against universal perturbation (less studied)
參考文獻:
- J. Z. Kolter and A. Madry: Adversarial Robustness - Theory and Practice (NeurIPS 2018 Tutorial)
https://www.youtube.com/watch?v=TwP-gKBQyic&t=4467s - https://adversarial-ml-tutorial.org/adversarial_examples/
- Ilyas, Andrew, et al. “Adversarial examples are not bugs, they are features.” Advances in Neural Information Processing Systems. 2019.