how to improve deep learning performance

20 Tips, Tricks and Techniques
to fight overfitting and get better generalization

list out all of the ideas that might give a lift in performance

4 sub-topics :

  • Improve performance with data
  • Improve performance with algorithms
  • Improve performance with algorithms tuning
  • Improve performance with Ensembles

从数据开始

  • 获得更多的数据,
  • 创造更多的数据,
  • 重新调整数据,
  • 转换数据,
  • 特征选择。

数据的质量是至关重要的。当然获得更多的数据越好,如果数据本身不够多,那么就需要产生一些数据(data augmentation or data generation):

  • if your data are vectors of numbers, create randomly modified
    versions of existing vectors
  • if you data are images, create randomly modified versions of existing
    images
  • if your data are text, ……

举个例子,对于图像数据,可以翻转现有的图片,得到新的数据。还可以对数据加入噪声,加入噪声的另一个好处是可以抑制overfitting。

将数据调整到一定的范围内,如果采用的是sigmoid激活函数,那么数据的范围在[0, 1],如果采用tanh,则是将数据压缩到[-1, 1]。

  • normalized to 0 to 1
  • rescaled to -1 to 1
  • standardized mean=0, var=1

要想了解数据,可以对数据进行可视化,其中可以找出outliers.
估计每一列的变量分布:

  • does a column look like a skewed Gaussian, consider adjusting the
    skew with a Box-Cox transfor
  • does a column look like an exponential distribution, consider a log
    transform
  • does a column look like it has some features, but they are being
    clobbered by something obvious, try squaring, or square-rooting make
    a feature discrete or binned in some way

improve performance with algorithms:

  • spot-check algorithms
  • steal from literature
  • resampling methods

no single algorithm can perform better than any other, all algorithms are equal.
collect evidence,

  • spot-check a suite of methods and see which fair well and which do
    not.
  • evaluate some linear methods like logistic regression and linear
    discriminate analysis
  • evaluate some tree methods like cart, random forest and gradient
    boosting
  • evaluate some instance methodsd like svm and knn
  • evaluate some other neural network methods like lvq, mlp, cnn, lstm,
    hybrids.

a great shortcut to picking a good method, is to steal ideas from literature.
write down all the ideas and work your way through them.

resampleing methods

  • we cannot use gold standard methods to estimate the performance of
    the model such as k-fold cross validation.
  • need to ensure that the split is representative of the problem.

improve performance with algorithm tuning
here are some ideas on tuning you neural network algorithms in order to get more out of them :

  • diagnostics : is your model overfitting or underfitting.
    如果训练的结果比验证集的结果好,那么可能存在overfitting,此时可以使用正则机制。如果训练集和验证集的得分都低,那么可能是underfitting,此时可以加强网络。如果训练集出现拐点,此时可以使用early
    stopping.
  • weight initialization : try all the different initialization methods
    offered , try pre-learning with an unsupervised method like an
    autoencoder , try taking an existing model and retraining a new input
    and output layer(transfer learning)。
  • learning rate
  • activation functions : should be using rectifier activation functions
    network topology : changes to your network structure. (how many
    layers and how many neurons do you need? —-no one knows, no one).
    try one hidden layer with a lot of neurons, try a deep network with
    few neurons per layer, try combinations of the above, try
    architectures from recent papers on problems similar to yours, try
    topology patterns and rules of thumb from books and papers.
  • batches and epochs : small batch sizes with large epoch size and a
    large number of training epochs . try batch size equal to training
    data size, memory depending(batch learning), try a batch size of
    one(online learning), try a grid search of different mini-batch
    sizes, try training for a few epochs and for a heck of a lot of
    epochs.
  • regularization : is a great approach to curb overfitting the training
    data, the hot new regularization technique is dropout, simple and
    effective. grid search different dropout percentages. experiment with
    dropout in the input, hidden and output layers.
  • optimization and loss
  • early stopping
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章