how to improve deep learning performance

20 Tips, Tricks and Techniques
to fight overfitting and get better generalization

list out all of the ideas that might give a lift in performance

4 sub-topics :

  • Improve performance with data
  • Improve performance with algorithms
  • Improve performance with algorithms tuning
  • Improve performance with Ensembles

從數據開始

  • 獲得更多的數據,
  • 創造更多的數據,
  • 重新調整數據,
  • 轉換數據,
  • 特徵選擇。

數據的質量是至關重要的。當然獲得更多的數據越好,如果數據本身不夠多,那麼就需要產生一些數據(data augmentation or data generation):

  • if your data are vectors of numbers, create randomly modified
    versions of existing vectors
  • if you data are images, create randomly modified versions of existing
    images
  • if your data are text, ……

舉個例子,對於圖像數據,可以翻轉現有的圖片,得到新的數據。還可以對數據加入噪聲,加入噪聲的另一個好處是可以抑制overfitting。

將數據調整到一定的範圍內,如果採用的是sigmoid激活函數,那麼數據的範圍在[0, 1],如果採用tanh,則是將數據壓縮到[-1, 1]。

  • normalized to 0 to 1
  • rescaled to -1 to 1
  • standardized mean=0, var=1

要想了解數據,可以對數據進行可視化,其中可以找出outliers.
估計每一列的變量分佈:

  • does a column look like a skewed Gaussian, consider adjusting the
    skew with a Box-Cox transfor
  • does a column look like an exponential distribution, consider a log
    transform
  • does a column look like it has some features, but they are being
    clobbered by something obvious, try squaring, or square-rooting make
    a feature discrete or binned in some way

improve performance with algorithms:

  • spot-check algorithms
  • steal from literature
  • resampling methods

no single algorithm can perform better than any other, all algorithms are equal.
collect evidence,

  • spot-check a suite of methods and see which fair well and which do
    not.
  • evaluate some linear methods like logistic regression and linear
    discriminate analysis
  • evaluate some tree methods like cart, random forest and gradient
    boosting
  • evaluate some instance methodsd like svm and knn
  • evaluate some other neural network methods like lvq, mlp, cnn, lstm,
    hybrids.

a great shortcut to picking a good method, is to steal ideas from literature.
write down all the ideas and work your way through them.

resampleing methods

  • we cannot use gold standard methods to estimate the performance of
    the model such as k-fold cross validation.
  • need to ensure that the split is representative of the problem.

improve performance with algorithm tuning
here are some ideas on tuning you neural network algorithms in order to get more out of them :

  • diagnostics : is your model overfitting or underfitting.
    如果訓練的結果比驗證集的結果好,那麼可能存在overfitting,此時可以使用正則機制。如果訓練集和驗證集的得分都低,那麼可能是underfitting,此時可以加強網絡。如果訓練集出現拐點,此時可以使用early
    stopping.
  • weight initialization : try all the different initialization methods
    offered , try pre-learning with an unsupervised method like an
    autoencoder , try taking an existing model and retraining a new input
    and output layer(transfer learning)。
  • learning rate
  • activation functions : should be using rectifier activation functions
    network topology : changes to your network structure. (how many
    layers and how many neurons do you need? —-no one knows, no one).
    try one hidden layer with a lot of neurons, try a deep network with
    few neurons per layer, try combinations of the above, try
    architectures from recent papers on problems similar to yours, try
    topology patterns and rules of thumb from books and papers.
  • batches and epochs : small batch sizes with large epoch size and a
    large number of training epochs . try batch size equal to training
    data size, memory depending(batch learning), try a batch size of
    one(online learning), try a grid search of different mini-batch
    sizes, try training for a few epochs and for a heck of a lot of
    epochs.
  • regularization : is a great approach to curb overfitting the training
    data, the hot new regularization technique is dropout, simple and
    effective. grid search different dropout percentages. experiment with
    dropout in the input, hidden and output layers.
  • optimization and loss
  • early stopping
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章