Weakness of adversarial training: overfit to the attack in use and hence does not generalize to test data
Curriculum adversarial training
思想:train model from weak attack to strong attack
方法
Let denote the attack strength, denote the maximal attack strength. denotes an attack class parameterized with .
Basic curriculum learning
i). start from no attack;
ii). train the model for one epoch and, once finished, calculate the -accuracy;
iii-a). if increases at least once over the last 10 epoches, continue training;
iii-b). if does not increase over the last 10 epoches, set the parameters of the model to be the best ones (i.e. 10 epoches ago), and increase by 1;
iv). Stop when .
Benefit: Training efficiency
Additional optimization technique: batch mixing
Motivation: The basic curriculum training can achieve a significantly reduction on the training time, it does not increase the robustness. One issues is : when the model is trained with a larger , it will forget the adversarial examples generated for a smaller .
Solution: Generate some adversarial examples using for each , and combine them to form a batch. The loss function is updated accordingly as:
where 's are hyperparameters such as . The authors set and generate the same amount of adversarial examples for each attack strength.
Additional optimization technique: quantization
Motivation: The model trained with CAT may not defend against attacks that are stronger than the strongest attack used during training.
Solution: Employ quantization, i.e. restrict to a -bit integer.
Rationale: Quantization reduces the space of adversarial examples. Specifically, let denotes the adversarial example. The difference of takes value from an infinite space if is real-valued; in contrast, it takes value from a finite space if is quantized to an integer vector.
Remark: Quantization is a generic inference time defense technique. This technique alone is not shown to provide resilience against strong white-box attacks. However, it is effective when using together with CAT since the model remembers adversarial example generated by weak attacks. Although a stronger attack can better optimize the loss function, the adversarial examples that it generates are highly likely to coincide with those generated by a weaker attack, because the entire adversarial example space is small.
實驗:Improve both efficiency and empirical worst-case accuracy against adversarial examples (termed resilience)
文獻:
Cai, Qi-Zhi, Chang Liu, and Dawn Song. “Curriculum adversarial training.” In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3740-3747. 2018.