Deep Learning：Optimization for Training Deep Models(零)

原創

蚊子爱牛牛

2020-06-25 22:34

Of all of the many optimization problems involved in deep learning, the most difficult is neural network training.
It is quite common to invest days to months of time on hundreds of machines in order to solve even a single instance of the neural network training problem.
Because this problem is so important and so expensive, a specialized set of optimization techniques have been developed for solving it. This chapter presents these optimization techniques for neural network training.
This chapter focuses on one particular case of optimization: finding the parameters θ of a neural network that significantly reduce a cost function J(θ), which typically includes a performance measure evaluated on the entire training set as well as additional regularization terms.

We begin with a description of how optimization used as a training algorithm for a machine learning task differs from pure optimization.
Next, we present several of the concrete challenges that make optimization of neural networks difficult.
We then define several practical algorithms, including both optimization algorithms themselves and strategies for initializing the parameters. More advanced algorithms adapt their learning rates during training or leverage information contained in the second derivatives of the cost function.
Finally, we conclude with a review of several optimization strategies that are formed by combining simple optimization algorithms into higher-level procedures.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

[Fri 19 Jun 2015 ~ Thu 25 Jun 2015] Deep Learning in arxiv

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses Leverage Financial News to Pr

2020-07-06 23:05:10

Deep Learning：正則化（十）

Sparse Representations Weight decay acts by placing a penalty directly on the model parameters. Another strategy is

蚊子爱牛牛

2020-06-25 22:34:02

Deep Learning：正則化（十三）

Adversarial Training In order to probe the level of understanding a network has of the underlying task, we can sear

蚊子爱牛牛

2020-06-25 22:34:02

Deep Learning：正則化（九）

Parameter Tying and Parameter Sharing Thus far, in this chapter, when we have discussed adding constraints or penal

蚊子爱牛牛

2020-06-25 22:34:02

Deep Learning：正則化（十四）

Tangent Distance, Tangent Prop, and Manifold Tangent Classifier Many machine learning algorithms aim to overcome th

蚊子爱牛牛

2020-06-25 22:34:02

Deep Learning：Optimization for Training Deep Models(一)

How Learning Differs from Pure Optimization Optimization algorithms used for training of deep models differ from tr

蚊子爱牛牛

2020-06-25 22:34:02

Deep Learning：Optimization for Training Deep Models（二）

Challenges in Neural Network Optimization When training neural networks, we must confront the general non-convex ca

蚊子爱牛牛

2020-06-25 22:34:02

人臉關鍵點：TCNN-Tweaked Convolutional Neural Networks

TCNN，全名Tweaked Convolutional Neural Networks，其中，Tweaking implies fine-tuning the final layers for particular head

2020-06-23 22:21:05

模型壓縮：Networks Slimming-Learning Efficient Convolutional Networks through Network Slimming

Network Slimming-Learning Efficient Convolutional Networks through Network Slimming（Paper） 2017年ICCV的一篇paper，思路清晰，

2020-06-23 22:21:05

人臉關鍵點：MTCNN-Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

創新點： 1. 首次將級聯和多任務結合起來，之前有單純級聯的DCNN，單純多任務的TCDCN 2. 提出 a new online hard sample mining strategy，沒接觸過hard sample min

2020-06-23 22:20:52

解讀CUDA C Programming Guide (1/5)

本系列爲《解讀CUDA C Programming Guide》. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4FIp5fBgM 本書旨在介紹

2020-06-21 22:34:29

TF-day1 MINIST識別數字

當我們開始學習編程的時候，第一件事往往是學習打印”Hello World”。就好比編程入門有Hello World，機器學習入門有MNIST 主要步驟 - 獲取數據 - 建立模型 - 定義 tensor，variabl

2020-06-21 03:53:38

TF-day6 CNN簡單分類

主要內容：何爲CNN code 一、何爲CNN 圖解何爲CNN http://www.jianshu.com/p/6daa1af1cf37 深入理解： http://study.163.com/course/cours

2020-06-21 02:52:04

深度學習-卷積理解

一.深度卷積神經網絡學習筆記（一）： 1.這篇文章以賈清揚的ppt說明了卷積的實質，更說明了卷積輸出圖像大小應該爲：假設輸入圖像尺寸爲W，卷積核尺寸爲F，步幅（stride）爲S（卷積核移動的步幅），Padding使用P（用於

Stray_Cat_Founder

2020-06-20 01:50:45

一天搞懂深度學習—學習筆記4（knowledge and tricks）

1.ultra deep network 世界上的摩天大樓有很多，而且大家也都一直在互相攀比誰的更高。文中給出了幾個標誌性建築，Great Pyramid < Eiffel Tower < Empire State < Worl

Stray_Cat_Founder

2020-06-20 00:43:27

24小時熱門文章

最新文章

最新評論文章