神经⽹络与深度学习 Neural Networks and Deep Learning

”An Enquiry Concerning Human Understanding” (1748). The problem of induction has been given a modern machinelearning form in the no-free lunch theorem (link) of David Wolpert and William Macready (1997).

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

梯度下降学习的动态有⼀种⾃规范化的效应。

13 In Gradient-Based Learning Applied to Document Recognition, by Yann LeCun, Léon Bottou, Yoshua Bengio, andPatrick Haffner (1998).

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

14 ImageNetClassificationwithDeepConvolutionalNeuralNetworks,byAlexKrizhevsky,IlyaSutskever,andGeoffrey

Hinton (2012).

“因为神经元不能依赖其他神经元特定的存在，这个技术其实减少了复杂的互适应的神经元。所以，强制要学习那些在

神经元的不同随机⼦集中更加健壮的特征。 ”

http://arxiv.org/pdf/1207.0580.pdf

15 Improving neural networks by preventing co-adaptation of feature detectors by Geoffrey Hinton, Nitish Srivastava,Alex Krizhevsky, Ilya Sutskever

弃权技术的真正衡量是它已经在提升神经⽹络性能上应⽤得相当成功。原始论⽂ 15 介绍了将其应⽤于很多不同任务的技术。

http://dx.doi.org/10.1109/ICDAR.2003.1227801

7 Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis, by Patrice Simard, Dave

Steinkraus, and John Platt (2003)

这⽚论⽂中，作者在 MNIST 上使⽤了⼏个的这种想法的变化⽅式。其中⼀种他们考虑的⽹络结构其实和我们已经使⽤过的类似——⼀个拥有 800 个隐藏元的前馈神经⽹络，使⽤了交叉熵代价函数。

http://scikit-learn.org/stable/

使⽤ scikit-learn 库提供的 SVM 替换神经⽹络

http://dx.doi.org/10.3115/1073012.1073017

对“算法 A 是不是要⽐算法 B 好？ ”正确的反应应该是“你在使⽤什么训练集合？ ”

显著的例⼦参⻅ Scaling to very very large corpora for natural language disambiguation, by Michele Banko and

Eric Brill (2001).

http://arxiv.org/pdf/1206.5533v2.pdf

权重初始化⽅法看看在 2012 年的 Yoshua Bengio 的论⽂ 22 的14 和 15 ⻚，以及相关的参考⽂献

Practical Recommendations for Gradient-Based Training of Deep Architectures, by Yoshua Bengio (2012)

http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf

https://github.com/jaberg/hyperopt

⽹格搜索的成就和限制（易于实现的变体）在 James Bergstra 和 Yoshua Bengio 2012 年的论⽂中已经给出了综述。很多更加精细的⽅法也被⼤家提出来了。我这⾥不会给出介绍，但是想指出 2012 年使⽤⻉叶斯观点⾃动优化超参数的论⽂。代码可以获得，也已经被其他的研究⼈员使⽤了。

http://en.wikipedia.org/wiki/Limited-memory_BFGS

优化代价函数的⽅法：当你越来越深⼊了解神经⽹络时，值得去尝试其他的优化技术，理解他们⼯作的原理，优势劣势，以及在实践中如何应⽤。前⾯我提到的⼀篇论⽂ 25 ，介绍并对⽐了这些技术，包含共轭梯度下降和 BFGS ⽅法（也可以看看 limited memory BFGS，L-BFGS。另⼀种近期效果很不错技术 26 是 Nesterov 的加速梯度技术，这个技术对 momentum 技术进⾏了改进。

http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

Efficient BackProp, by Yann LeCun, Léon Bottou, Genevieve Orr and Klaus-Robert Müller (1998).

http://www.cs.toronto.edu/~hinton/absps/momentum.pdf

例如，看看 On the importance of initialization and momentum in deep learning, by Ilya Sutskever, James Martens,

George Dahl, and Geoffrey Hinton (2012).

http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/chivdv7

你怎么看那些全部由实验效果⽀撑（⽽⾮数学保证）的使⽤和研究机器学习技术呢？同样，在哪些场景中，你已经注意到这些技术失效了？

https://zh.wikipedia.org/wiki/哈恩－巴拿赫定理

http://www.dartmouth.edu/~gvc/Cybenko_MCSS.pdf

http://www.sciencedirect.com/science/article/pii/0893608089900208

普遍性定理

1 Approximation by superpositions of a sigmoidal function, by George Cybenko (1989). The result was very much

in the air at the time, and several groups proved closely related results. Cybenko’s paper contains a useful discussionof much of that work. Another important early paper is Multilayer feedforward networks are universal approximators,by Kurt Hornik, Maxwell Stinchcombe, and Halbert White (1989). This paper uses the Stone-Weierstrass theorem toarrive at similar results

http://arxiv.org/pdf/1312.6098.pdf

http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

正如线路的⽰例中看到的那样，存在着理论上的研究结果告诉我们深度⽹络在本质上⽐浅层⽹络更加强⼤

1 对某些问题和⽹络结构，Razvan Pascanu, Guido Montúfar, and Yoshua Bengio 在 2014 年的这篇⽂章 On the

number of response regions of deep feed forward networks with piece-wise linear activations 给出了证明。更加详细

的讨论在 Yoshua Bengio 2009 年的著作 Learning deep architectures for AI 的第2部分。

https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip

单⼀隐藏层的神经⽹络⽰例消失的梯度问题

http://en.wikipedia.org/wiki/Gabor_filter Gabor 滤波器

http://arxiv.org/abs/1311.2901

现在有许多关于通过卷积⽹络来更好理解特征的⼯作成果。如果你感兴趣，我建议从 Matthew Zeiler 和 Rob Fergus 的

（2013）论⽂ Visualizing and Understanding Convolutional Networks 开始。

http://deeplearning.net/software/theano/

Theano 的机器学习库

http://aws.amazon.com/ec2/instance-types/

Amazon Web Services EC2 G2 实例类型你手机上的系统没有可用·的 GPU

http://dx.doi.org/10.1109/ICDAR.2003.1227801

Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis, by Patrice Simard, Dave

Steinkraus, and John Platt (2003).

在 2003 年，Simard，Steinkraus 和Platt 19 使⽤⼀个神经⽹络改进了他们的 MNIST 性能，达到了 99.6%，这个⽹络以其它⽅式和我们的⾮常相似，使⽤两个卷积–混合层，跟着⼀个具有 100 个神经元的隐藏的全连接层。

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html

仍然有可能在 MNIST 上提⾼性能。Rodrigo Benenson 汇编了⼀份信息汇总⻚⾯，显⽰这⼏年的进展，提供了论⽂的链接。

http://arxiv.org/abs/1003.0358

近期成果的大部分内容、它是⼀篇 Cireșan、Meier、Gambardella、和 Schmidhuber 所著的 2010 年论⽂ 20 。我喜欢这篇论⽂的地⽅是它是如此简单。其中的⽹络是⼀个许多层的神经⽹络，仅使⽤全连接层（没有卷积层）

Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by Dan Claudiu Cireșan, Ueli Meier, Luca

Maria Gambardella, and Jürgen Schmidhuber (2010).

http://www.image-net.org/

2012 LRMD 论⽂：让我从⼀篇源⾃斯坦福和⾕歌的研究⼩组的 2012 年论⽂ 22 开始。我把这篇论⽂称为 LRMD，取⾃前四位作者的姓。LRMD 使⽤⼀个神经⽹络来分类 ImageNet 的图像，⼀个⾮常具有挑战性的图像识别问题。

http://www.image-net.org/synset?wnid=n03489162

明显是⼀个⽐ MNIST 更有挑战性的图像识别任务

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

2012 KSH 论⽂： LRMD 的成果被⼀篇 Krizhevsky, Sutskever 和 Hinton （KSH） 24 的 2012

年论⽂追随。KSH 训练和测试⼀个深度卷积神经⽹络，它使⽤ ImageNet 数据的⼀个有限的⼦集

24 ImageNet classification with deep convolutional neural networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton (2012).

https://code.google.com/p/cuda-convnet/

Alex Krizhevsky 的 cuda-convnet（和接替版本），它包含有实现这许多思想的

代码。⼀个基于 Theano 的实现 26 ，代码可以在这⾥得到。

http://caffe.berkeleyvision.org/model_zoo.html

Caffe 神经⽹络框架也包含⼀个 KSH ⽹络的版本，详细参⻅ Model Zoo。

、

、http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

为了做这件事，他们构建了⼀个系统让⼈类对 ILSVRC 图像进⾏分类。其作者之⼀ Andrej Karpathy 在⼀篇博

⽂中解释道，

http://arxiv.org/abs/1412.2302

26 Theano-based large-scale visual recognition with multiple GPUs, by Weiguang Ding, Ruoyan Wang, Fei Mao, andGraham Taylor (2014).

http://arxiv.org/abs/1409.4842

27 Going deeper with convolutions, by Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,

Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich (2014).

http://arxiv.org/abs/1409.0575

28 ImageNet large scale visual recognition challenge, by Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause,

Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg,and Li Fei-Fei (2014).

http://arxiv.org/abs/1410.4615

RNN 被⽤来将传统的算法思想，⽐如说 Turing 机或者编程语⾔，和神经⽹络进⾏联系上。这

篇 2014 年的论⽂提出了⼀种 RNN 可以以 python 程序的字符级表达作为输⼊，⽤这个表达来预

测输出。简单说，⽹络通过学习来理解某些 python 的程序

http://arxiv.org/abs/1410.5401

第⼆篇论⽂同样是 2014 年的，使

⽤ RNN 来设计⼀种称之为“神经 Turing 机”的模型。这是⼀种通⽤机器整个结构可以使⽤梯度

下降来训练。作者训练 NTM 来推断对⼀些简单问题的算法，⽐如说排序和复制

http://research.google.com/pubs/VincentVanhoucke.html

一个基于深度⽹络的系统已经⽤在了 Google 的 Android 操作系统中（详⻅ Vincent

Vanhoucke 的 2012-2015 论⽂）。

http://dx.doi.org/10.1162/neco.1997.9.8.1735

一个称为长短期记忆（long short-term memory）的单元进⼊ RNN 中。LSTM 最早是由

Hochreiter和Schmidhuber在1997年提出，就是为了解决这个不稳定梯度的问题。 LSTM让RNN

训练变得相当简单，很多近期的论⽂（包括我在上⾯给出的那些）都是⽤了LSTM或者相关的想法

http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf

http://www.sciencemag.org/content/313/5786/504.short

深度信念⽹络，生成式模型和 Boltzmann 机：

参⻅ Geoffrey Hinton, Simon Osindero 和 Yee-Whye Teh 在 2006 年的 A fast learning algorithm for deep belief

nets, 及 Geoffrey H

http://www.scholarpedia.org/article/Deep_belief_networks

DBN 综述

http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

包含了很多关于 DBN核⼼组件的受限 Boltzmann 机的有价值的信息。

热⻔的领域包含使⽤神经⽹络来做

http://machinelearning.org/archive/icml2008/papers/391.pdf

自然语言处理 natural languageprocessing(see also this informative review paper)、

http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces

机器翻译 machine translation，

http://yann.lecun.com/exdb/publis/pdf/humphrey-jiis-13.pdf

音乐信息学 music informatics。

http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

强化学习的技术来学习玩电⼦游戏 play video games well

http://en.wikipedia.org/wiki/Conway%27s_law Conway 法则

Mrchesian

发布了64 篇原创文章 · 获赞 64 · 访问量 23万+

私信关注

神经⽹络与深度学习 Neural Networks and Deep Learning

Python多线程编程深度探索：从入门到实战

《日本蜡烛图》读书笔记 & 技术分析回测

《期货-市场技术分析》读书笔记

mongodb处理json数据很好

35K*14 薪，入职了！这公司只要不裁员，我能一直呆下去！

Python 面向對象（教程4）

defaultdict 和 namedtuple 的使用（python）

Java UDP 組播實現

python 實現 knn分類算法 (Iris 數據集）

python 實現識別手寫 MNIST數字集的程序

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結