Big picture on why we need randomness in stochastic algorithms
- randomness during initialization: as the structure of the search space is unknown
- randomness during the progression of the search: avoid getting stuck in local optima
– This is achieved in SGD or mini-batch gradient descent
備:examples of stochastic algorithms – stochastic gradient descent, genetic algorithms, simulated annealing
Principal and methods for random initialization
Principal: initialize the weights of neural networks to some random but close-to-zero values, such as in [0,0.1].
Methods: see https://keras.io/initializers/.
Reason for effectiveness of random initialization
If two hidden units with the same activation function are connected to the same inputs and have the same initial parameters, then a deterministic learning algorithm applied to a deterministic cost and model will constantly update both of these units in the same way. These units must have different initial parameters to “”.
Evaluation of neural networks with random initialization
The most effective way to evaluate the performance of a neural network configuration is to repeat the search process multiple times and report the average performance of the model over those repeats. This gives the configuration the best chance to search the space from multiple different sets of initial conditions. Sometimes this is called a or .
參考文獻
- Why Initialize a Neural Network with Random Weights?
https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/