本文價值:understand the limitations of u.c. based bounds / cast doubt on the power of u.c. bounds to fully explain generalization in DL
- highlight that explaining the training-set-size dependence of the generalization error is apparently just as non-trivial as explaining its parameter-count dependence.
- show that there are scenarios where all uniform convergence bounds, however cleverly applied, become vacuous.
Development of generalisation bound
stage 1:conventional u.c. bound
con: representational complexity depthwidth – too vacuous for deep networks
stage 2: refined u.c. bound
con: 1. too large or grow with parameter count; 2. the ones that are small hold only on modified networks like compression, explicit regularisation, randomisation; 3. (本文) generalisation bound increases with training set size; 4. (本文) in certain situations of DL, any refined u.c. bound would be vacuous in the following sense:
proof of 4
Propose a concept of ‘tightest uniform convergence’ and prove that this bound would become vacuous.
Step 1: create a new dataset by projecting training data onto opposite hypersphere and flip the label
Step 2: even though the test error and the training error are both very small, the dataset is completely misclassified, indicating that the complexity of decision boundary is high enough to memorise skews in the locations of training datapoints.
Step 3: mathematically show that the tightest u.c. bound is lower bounded by this inherent complexity and thus is vacuous.
對generalisation bound的反思
- generalisation bound should at least show satisfactory dependence on parameter count and training set size.
參考文獻
- Nagarajan, Vaishnavh, and J. Zico Kolter. “Uniform convergence may be unable to explain generalization in deep learning.” Advances in Neural Information Processing Systems. 2019.
http://www.cs.cmu.edu/~vaishnan/talks/neurips19_uc_slides.pdf