LSTNet

代碼

https://github.com/laiguokun/LSTNet

論文

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks.

LSTNet

在這裏插入圖片描述

部分參數解釋

參數 默認值 解釋
model(str) ‘LSTNet’
hidCNN(int) 100 number of CNN hidden units
hidRNN(int) 100 number of RNN hidden units
window(int) 24 * 7 window size
CNN_kernel(int) 6 the kernel size of the CNN layers
highway_window(int) 24 The window size of the highway component
clip(float) 10. gradient clipping
epochs(int) 100 upper epoch limit
batch_size(int) 32 batch size
dropout(float) 0.2 dropout applied to layers (0 = no dropout)
seed(int) 54321 random seed
gpu(int) None
log_interval(int) 2000 report interval
save(str) ‘model/model.pt’ path to save the final model
cuda(str) True
optim(str) ‘adam’
lr(float) 0.001
horizon(int) 12
skip(float) 24
hidSkip(int) 5
L1Loss(bool) True
normalize(int) 2
output_fun(str) ‘sigmoid’

model

這是作者提供的

import torch
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self, args, data):
        super(Model, self).__init__()
        self.use_cuda = args.cuda
        self.P = args.window  # 輸入窗口大小
        self.m = data.m  # 列數,變量數
        self.hidR = args.hidRNN
        self.hidC = args.hidCNN  # 卷積核數
        self.hidS = args.hidSkip
        self.Ck = args.CNN_kernel  # 卷積核大小
        self.skip = args.skip;
        self.pt = (self.P - self.Ck)//self.skip
        self.hw = args.highway_window
        self.conv1 = nn.Conv2d(1, self.hidC, kernel_size = (self.Ck, self.m));
        self.GRU1 = nn.GRU(self.hidC, self.hidR);
        self.dropout = nn.Dropout(p = args.dropout);
        if (self.skip > 0):
            self.GRUskip = nn.GRU(self.hidC, self.hidS);
            self.linear1 = nn.Linear(self.hidR + self.skip * self.hidS, self.m);
        else:
            self.linear1 = nn.Linear(self.hidR, self.m);
        if (self.hw > 0):
            self.highway = nn.Linear(self.hw, 1);
        self.output = None;
        if (args.output_fun == 'sigmoid'):
            self.output = F.sigmoid;
        if (args.output_fun == 'tanh'):
            self.output = F.tanh;
 
    def forward(self, x):
        batch_size = x.size(0);   # x: [batch, window, n_val]
        
        # CNN
        c = x.view(-1, 1, self.P, self.m)  # c: [batch, 1, window, n_val]
        c = F.relu(self.conv1(c))  # c: [batch, hidCNN, window-kernelsize+1, 1]
        c = self.dropout(c)
        c = torch.squeeze(c, 3)  # c: [batch, hidCNN, window-kernelsize+1]
        
        # RNN 
        r = c.permute(2, 0, 1).contiguous()  # c: [window-kernelsize+1, batch, hidCNN]
        _, r = self.GRU1(r)  # r: [1, batch, hidRNN]
        r = self.dropout(torch.squeeze(r,0))  # r: [batch, hidRNN]

        
        # skip-rnn
        
        if (self.skip > 0):
            s = c[:,:, int(-self.pt * self.skip):].contiguous()  # s: [batch, hidCNN, pt*skip]
            s = s.view(batch_size, self.hidC, self.pt, self.skip)  # s: [batch, hidCNN, pt, skip]
            s = s.permute(2,0,3,1).contiguous()  # s: [pt, batch, skip, hidCNN]
            s = s.view(self.pt, batch_size * self.skip, self.hidC)   # s: [pt, batch * skip, hidCNN]
            _, s = self.GRUskip(s)   # s: [1, batch * skip, hidSkip]
            s = s.view(batch_size, self.skip * self.hidS)   # s: [batch, skip * hidSkip]
            s = self.dropout(s)
            r = torch.cat((r,s),1)  # r: [batch, skip * hidSkip + hidRNN]
        
        res = self.linear1(r)  # res: [batch, n_val]
        
        # highway
        
        if (self.hw > 0):
            z = x[:, -self.hw:, :]  # z: [batch, hw, n_val]
            z = z.permute(0,2,1).contiguous().view(-1, self.hw)  # z: [batch*n_val, hw]
            z = self.highway(z)  # z: [batch*n_val, 1]
            z = z.view(-1,self.m) # z: [batch, n_val]
            res = res + z  # res: [batch, n_val]
            
        if (self.output):
            res = self.output(res)
        return res

代碼中用到 GRU 作爲 RNN 單元
r=σ(Wirx+bir+Whrh+bhr)z=σ(Wizx+biz+Whzh+bhz)n=tanh(Winx+bin+r(Whnh+bhn))h=(1z)n+zh \begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}
Inputs: input, h0h_0

  • input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.
  • h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs: output, hnh_n

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t. If a :class: torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively.

Similarly, the directions can be separated in the packed case.

  • h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

Like output, the layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size).

在 代碼中,只用到了輸出的狀態 hnh_n

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章