論文解釋:https://blog.csdn.net/u010352603/article/details/80590129#11-memorization-和-generalization
https://xueqiu.com/9217191040/110449554
訓練方法:小批量隨機梯度下降https://blog.csdn.net/xiang_freedom/article/details/78395145
AdaGrad:https://blog.csdn.net/program_developer/article/details/80756008