DL學習筆記【20】nn包中的各位Simple layers

來自教程:
https://github.com/torch/nn/blob/master/doc/simple.md
這個還是比較容易明白,只簡單寫一下咯

Parameterized Modules

Linear 
公式如下:
y = Ax + b

SparseLinear 
稀疏的,輸入的x與普通的有所不同:
x = torch.Tensor({ {1, 0.1}, {2, 0.3}, {10, 0.3}, {31, 0.2} })

 print(x)

  1.0000   0.1000
  2.0000   0.3000
 10.0000   0.3000
 31.0000   0.2000
第一個參數爲位置,第二個參數爲該位置的數值

Bilinear 
公式如下:
\forall k: y_k = x_1 A_k x_2 + b
例子代碼如下:
 input = {torch.randn(128, 10), torch.randn(128, 5)}  -- 128 input examples
 module:forward(input)

PartialLinear 
可以只取輸入數據的一部分來用,比如輸入是5維,我們可以只取兩個維度來計算。之後還可以恢復使用5維。
例子代碼:
module = nn.PartialLinear(5, 3)  -- 5 inputs, 3 outputs
module:setPartition(torch.Tensor({2,4})) -- only compute the 2nd and 4th indices out of a total of 5 indices

Add
只學習bias

CAdd
多維bias

Mul
只學習w

CMul
多維w

Euclidean
公式:
y_j = || w_j - x ||
權重和輸入爲什麼可以相減?

WeightedEuclidean
公式:
y_j = || c_j * (w_j - x) ||

Cosine
公式:
y_j = (x · w_j) / ( || w_j || * || x || )


identity的代碼沒有看懂
pred_mlp = nn.Sequential()  -- A network that makes predictions given x.
pred_mlp:add(nn.Linear(5, 4))
pred_mlp:add(nn.Linear(4, 3))

xy_mlp = nn.ParallelTable() -- A network for predictions and for keeping the
xy_mlp:add(pred_mlp)        -- true label for comparison with a criterion
xy_mlp:add(nn.Identity())   -- by forwarding both x and y through the network.

mlp = nn.Sequential()       -- The main network that takes both x and y.
mlp:add(xy_mlp)             -- It feeds x and y to parallel networks;
cr = nn.MSECriterion()
cr_wrap = nn.CriterionTable(cr)
mlp:add(cr_wrap)            -- and then applies the criterion.

for i = 1, 100 do           -- Do a few training iterations
   x = torch.ones(5)        -- Make input features.
   y = torch.Tensor(3)
   y:copy(x:narrow(1,1,3))  -- Make output label.
   err = mlp:forward{x,y}   -- Forward both input and output.
   print(err)               -- Print error from criterion.

   mlp:zeroGradParameters() -- Do backprop...
   mlp:backward({x, y})
   mlp:updateParameters(0.05)
end


寫好多啊。。。不想寫啦。。。直接複製粘貼好啦,哈哈哈哈

Modules that adapt basic Tensor methods :

Copy :copy of the input with type casting ; 看解釋好像是把input複製到output中,不太瞭解用處嗯。。output和input的值一樣多麼?

Narrow : a narrow operation over a given dimension ;

Replicate : repeats input n times along its first dimension ;

Reshape : a reshape of the inputs ;

View : a view of the inputs ;

Contiguous : contiguous of the inputs ;

Select : a select over a given dimension ;

MaskedSelect : a masked select module performs the torch.maskedSelect operation ;

Index : a index over a given dimension ;

Squeeze : squeezes the input;

Unsqueeze : unsqueeze the input, i.e., insert singleton dimension;

Transpose : transposes the input ;

Modules that adapt mathematical Tensor methods :

AddConstant : adding a constant ;

MulConstant : multiplying a constant ;

Max : a max operation over a given dimension ;

Min : a min operation over a given dimension ;

Mean : a mean operation over a given dimension ;

Sum : a sum operation over a given dimension ;

Exp : an element-wise exp operation ;

Log : an element-wise log operation ;

Abs : an element-wise abs operation ;

Power : an element-wise pow operation ;

Square : an element-wise square operation ;

Sqrt : an element-wise sqrt operation ;

Clamp : an element-wise clamp operation ;

Normalize : normalizes the input to have unit L_p norm ;

MM : matrix-matrix multiplication (also supports batches of matrices) ;

Miscellaneous Modules :

BatchNormalization : mean/std normalization over the mini-batch inputs (with an optional affine transform) ;

PixelShuffle : Rearranges elements in a tensor of shape [C*r, H, W] to a tensor of shape [C, H*r, W*r] ;

Identity : forward input as-is to output (useful with ParallelTable) ;

Dropout : masks parts of the input using binary samples from a bernoulli distribution ;

SpatialDropout : same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;

VolumetricDropout : same as Dropout but for volumetric inputs where adjacent voxels are strongly correlated ;

Padding : adds padding to a dimension ;

L1Penalty : adds an L1 penalty to an input (for sparsity) ;

GradientReversal : reverses the gradient (to maximize an objective function) ;

GPU : decorates a module so that it can be executed on a specific GPU device.

TemporalDynamicKMaxPooling : selects the k highest values in a sequence. k can be calculated based on sequence length ;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章