最近看了一些rnn和lstm的資料，算是科普性的瞭解了，在此做一個總結。

目錄：
1RNN
1）RNN結構
2）RNN（本質）
3） The Problem of Long-Term Dependencies of RNN
2LSTM
1）LSTM結構
2）LSTM結構詳解
3）Variants on Long Short Term Memory
3Traditional NN VS RNN&LSTM
4RNN&LSTM 的一些簡單應用

1RNN

1.1RNN結構

爲三層神經網絡，和普通的神經網絡相比，就是在hidden layer的時候，還添加了前一次hiddenlayer的輸出。
具體看公式就知道了：
1）forward propagation

2）Back propagation through time （BPTT）（BPTT一般都是在一個時間斷t之後執行）：
具體推導詳見：
http://www.mamicode.com/info-detail-1547845.html

1.2RNN（本質）
RNN的本質是一個數據推斷（inference）機器，它可以尋找兩個時間序列之間的關聯，只要數據足夠多，就可以得到從x（t）到y（t）的概率分佈函數，從而達到推斷和預測的目的。這和HMM模型有着千絲萬縷的關係。
—HMM，隱馬爾科夫模型（貝葉斯網）
通過躍遷矩陣，將ht和ht-1關聯上，每個節點有實際含義
—RNN
通過神經元之間的鏈接，將ht與ht-1關聯上，神經元只是信息流動的樞紐

1.3The Problem of Long-Term Dependencies of RNN

1）1. recent information to perform the present task

如圖，對於當前輸入，x3，要是和相鄰的recent information x0和x1有關，在h3的時候能夠很好的表達。

2）need more context （long-term dependencies）

如圖，對於當前輸入，xt+1 要是和較早之前的信息x0，x1相關聯。則在ht+1的時候，可能很難體現出x0，x1的信息，因爲存在梯度消失的問題

具體解釋，請看論文
Hochreiter (1991) [German]
http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf
和
Bengio, et al. (1994)
http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf

2.LSTM （Long Short-Term Memory ) (IEEE1997)

這是RNN的結構：

這是LSTM的結構：

可以看到LSTM與RNN唯一的區別就是在隱藏層多了一些單元

2.1LSTM結構

2.2LSTM結構詳解：
Code ：https://github.com/nicodjimenez/lstm
參考網址：http://colah.github.io/posts/2015-08-Understanding-LSTMs/

2.3Variants on Long Short Term Memory
1）Recurrent nets that time and count （IEEE 2000）
將cell state 的信息加入到三種 gate中

implement code
http://christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/LSTM.php

2）combines the forget and input gates into a single “update gate.”

1-ft的原因是，forget的數據部分需要用curX來補充

3）Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (EMNLP 2014)
introduce Gated Recurrent Unit(GRU)

這個是現在較爲有名的GRU模型，是將前面1）和2）做了一個結合，在進行一些變化後的產物。

此外在LSTM: A Search Space Odyssey Klaus Greff 2015論文中，對各種LSTM模型進行了一個對比，在An Empirical Exploration of Recurrent Network Architectures 2015 Koutnik中測試了各種RNN和LSTM的模型，結果非常有趣，某些task下，rnn的結果要比lstm的要好

3Traditional NN VS RNN&LSTM

最左邊的是傳統的NN模型的形式，一個輸入對應一個輸出，比如image classification。
下面各種可能分析：
One to one :from fixed-sized input to fixed-sized output (e.g. image classification)
One to many :Sequence output (e.g. image captioning)
Many to one : Sequence input (e.g. sentiment analysis).
Man to many:
1) Sequence input and sequence output (e.g. Machine Translation)
2) Synced sequence input and output (e.g. video classification label each frame of the video).

4RNN&LSTM 的一些簡單應用
4.1Language Models
1）Input ：hell
predict the next char ‘o’

2）generate text
例如，輸入一個起始文本：’in palo alto’，生成後面的100個單詞。
例如，訓練唐詩n首，然後輸入一個起始字，生成古詩文。

4.2Machine Translation
Sequence to Sequence Learning with Neural Networks（google nips 2015）
●Challenges for traditional Feed Forward Neural Networks are varying source and target lengths
具體怎麼做：
A encoder-decoder framework

First a lstm encode the Sequence by get the ht vector ，

Then another lstm decode the ht vector as input
(actually a Language model generates text one by one which is the output Sequence)
這種架構的缺點是：
●缺點：無論之前的context有多長，包含多少信息量，最終都要被壓縮成一個幾百維的vector。這意味着context越大，最終的state vector會丟失越多的信息
解決方案：●attention 機制：

attention相關論文：
1） Reasoning about Neural Attention google ICLR 2016
2） Neural Machine Translation by Jointly Learning to Align and Translate
3） A Neural Attention Model for Abstractive Sentence Summarization
4） Teaching Machines to Read and Comprehend

4.3 Image Captioning
1)Long-term Recurrent Convolutional Networks for Visual Recognition and Description [CVPR2015]
2)Show and tell: A neural image caption generator (ieee 2015 google)

簡單來說，就是先將圖片放入cnn裏面跑，從隱藏層得到圖片表達的向量，再將這個向量作爲一個lstm的輸入，通過language model生成一段描述性的話。

除此之外還有一些應用：
●OCR
1) “Can we build language-independent OCR using LSTMnetworks?.” (Acm 2013)
● Speech Recognition:
1)Towards end-to-end speech recognition with recurrent neural networks ICML,2014.
● Computer-composed Music:
1)Composing Music With Recurrent Neural Networks

●Also can do mnist classification (啓發就是可以這麼對圖片進行處理，來用rnn跑)
source code ：
https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series/blob/master/mnist-rnn.ipynb

From RNN To LSTM

1RNN

2.LSTM （Long Short-Term Memory ) (IEEE1997)

關於遊戲付費的一點想法

我通過CKA和CKS啦！

論文筆記之Structural Deep Network Embedding

論文筆記之Fully Convolutional Networks for Semantic Segmentation

神經網絡動量因子

Deep Learning 之參數初始化

word2vec 過程理解&詞向量的獲取

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結