調參之random initialization

Big picture on why we need randomness in stochastic algorithms

  • randomness during initialization: as the structure of the search space is unknown
  • randomness during the progression of the search: avoid getting stuck in local optima
    – This is achieved in SGD or mini-batch gradient descent

備:examples of stochastic algorithms – stochastic gradient descent, genetic algorithms, simulated annealing

Principal and methods for random initialization

Principal: initialize the weights of neural networks to some random but close-to-zero values, such as in [0,0.1].

Methods: see https://keras.io/initializers/.

Reason for effectiveness of random initialization

If two hidden units with the same activation function are connected to the same inputs and have the same initial parameters, then a deterministic learning algorithm applied to a deterministic cost and model will constantly update both of these units in the same way. These units must have different initial parameters to “break symmetry\textcolor{green}{\text{\small break symmetry}}”.

Evaluation of neural networks with random initialization

The most effective way to evaluate the performance of a neural network configuration is to repeat the search process multiple times and report the average performance of the model over those repeats. This gives the configuration the best chance to search the space from multiple different sets of initial conditions. Sometimes this is called a multiple restart\textcolor{green}{\text{\small multiple restart}} or multiple-restart search\textcolor{green}{\text{\small multiple-restart search}}.


參考文獻

  1. Why Initialize a Neural Network with Random Weights?
    https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章