Uniform convergence may be unable to explain generalization in deep learning

本文價值：understand the limitations of u.c. based bounds / cast doubt on the power of u.c. bounds to fully explain generalization in DL

highlight that explaining the training-set-size dependence of the generalization error is apparently just as non-trivial as explaining its parameter-count dependence.
show that there are scenarios where all uniform convergence bounds, however cleverly applied, become vacuous.

Development of generalisation bound

stage 1：conventional u.c. bound

$\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of whole hypothesis class}}{\text{training set size}}}\Big)$
con: representational complexity $\propto$ depth $\times$ width – too vacuous for deep networks

stage 2: refined u.c. bound $\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}$

$\text{generalisation gap} \leq O\Big(\sqrt{\frac{\text{representational complexity of 'relevant' subset of hypothesis class}}{\text{training set size}}}\Big)$
con: 1. too large or grow with parameter count; 2. the ones that are small hold only on modified networks like compression, explicit regularisation, randomisation; 3. (本文) generalisation bound increases with training set size; 4. (本文) in certain situations of DL, any refined u.c. bound would be vacuous in the following sense:
$\underset{\text{even though this is small } (\approx 0)}{\text{generalization bound}} \leq \underset{\text{this will be vacuous } (\approx 1)}{\text{any refined u.c. bound}}$

proof of 4

Propose a concept of ‘tightest uniform convergence’ and prove that this bound would become vacuous.
Step 1: create a new dataset $S'$ by projecting training data onto opposite hypersphere and flip the label
Step 2: even though the test error and the training error are both very small, the dataset $S'$ is completely misclassified, indicating that the complexity of decision boundary is high enough to memorise skews in the locations of training datapoints.
Step 3: mathematically show that the tightest u.c. bound is lower bounded by this inherent complexity and thus is vacuous.

對generalisation bound的反思

generalisation bound should at least show satisfactory dependence on parameter count and training set size.

參考文獻

Nagarajan, Vaishnavh, and J. Zico Kolter. “Uniform convergence may be unable to explain generalization in deep learning.” Advances in Neural Information Processing Systems. 2019.
http://www.cs.cmu.edu/~vaishnan/talks/neurips19_uc_slides.pdf

Uniform convergence may be unable to explain generalization in deep learning

Development of generalisation bound

stage 1：conventional u.c. bound

stage 2: refined u.c. bound $\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}$

proof of 4

對generalisation bound的反思

梯度下降、隨機梯度下降法、及其改進

機器學習中的凸和非凸優化問題

L1正則項與稀疏性

驗證梯度的正確性

Deep Learning相關概念

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Uniform convergence may be unable to explain generalization in deep learning

Development of generalisation bound

stage 1：conventional u.c. bound

stage 2: refined u.c. bound – take into account the implicit bias of SGD (more broadly, algorithm-dependent)\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}– take into account the implicit bias of SGD (more broadly, algorithm-dependent)

proof of 4

對generalisation bound的反思

stage 2: refined u.c. bound $\text{\small -- take into account the implicit bias of SGD (more broadly, algorithm-dependent)}$