Uniform convergence may be unable to explain generalization in deep learning

本文價值:understand the limitations of u.c. based bounds / cast doubt on the power of u.c. bounds to fully explain generalization in DL

  1. highlight that explaining the training-set-size dependence of the generalization error is apparently just as non-trivial as explaining its parameter-count dependence.
  2. show that there are scenarios where all uniform convergence bounds, however cleverly applied, become vacuous.
Development of generalisation bound
stage 1:conventional u.c. bound

generalisation gapO(representational complexity of whole hypothesis classtraining set size)\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of whole hypothesis class}}{\text{training set size}}}\Big)
con: representational complexity \propto depth×\timeswidth – too vacuous for deep networks

stage 2: refined u.c. bound – take into account the implicit bias of SGD (more broadly, algorithm-dependent)\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}

generalisation gapO(representational complexity of ’relevant’ subset of hypothesis classtraining set size)\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of 'relevant' subset of hypothesis class}}{\text{training set size}}}\Big)
con: 1. too large or grow with parameter count; 2. the ones that are small hold only on modified networks like compression, explicit regularisation, randomisation; 3. (本文) generalisation bound increases with training set size; 4. (本文) in certain situations of DL, any refined u.c. bound would be vacuous in the following sense:
generalization boundeven though this is small (0)any refined u.c. boundthis will be vacuous (1) \underset{\text{even though this is small } (\approx 0)}{\text{generalization bound}} \leq \underset{\text{this will be vacuous } (\approx 1)}{\text{any refined u.c. bound}}

proof of 4

Propose a concept of ‘tightest uniform convergence’ and prove that this bound would become vacuous.
Step 1: create a new dataset SS' by projecting training data onto opposite hypersphere and flip the label
Step 2: even though the test error and the training error are both very small, the dataset SS' is completely misclassified, indicating that the complexity of decision boundary is high enough to memorise skews in the locations of training datapoints.
Step 3: mathematically show that the tightest u.c. bound is lower bounded by this inherent complexity and thus is vacuous.

對generalisation bound的反思
  1. generalisation bound should at least show satisfactory dependence on parameter count and training set size.

參考文獻

  1. Nagarajan, Vaishnavh, and J. Zico Kolter. “Uniform convergence may be unable to explain generalization in deep learning.” Advances in Neural Information Processing Systems. 2019.
    http://www.cs.cmu.edu/~vaishnan/talks/neurips19_uc_slides.pdf
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章