吳恩達機器學習之評估,判斷

1.When it's going on a training, the data should be divided into three parts:

training data(訓練集),cross validation data(交叉驗證集),and test data(測試集)

There will be three types error:

The three error can be very vital when people suppose to decide what degree of polynominal(多項式次數) to fit to a data set. We should use cross validation data and test data at the same time.

 

2.If a learning algorithm dosen't do as well as people are hoping, almost al the time it will be because it has either a high bias problem or a high variance problem.

So it's about underfitting problem or overfitting problem.

Here is a way to adjudge the learning algorithm is underfitting or overfitting:

increase the degree of polynominal and watch the change of both test error and cross validation error.

So, if the training error and cv error decrease at the same time, it is the underfitting problem.

Or, if the cv error far more large than train error, it is the overfitting problem.

train data作爲測試集,是最直觀,或者說貼合的反應數據的準確程度。隨着多項式的增多,函數愈發複雜,在多次嘗試下,learning algorithm逐漸趨向與完全貼合測試集,最後看上去似乎是一點失誤都沒有。而實際上,其實已經過擬合了。

但是交叉驗證集就很粗暴了,這組數據同反覆錘鍊的learning algorithm貼合度不高。所以當遇到欠擬合問題時,error很高;遇到過擬合問題時,error很高。單獨比較cv error曲線是不明顯的,拿它和train error比較,結果就很明顯了。

下面的選定合適的正則化參數也是用了相似的辦法。

 

3.choosing regularization parameter

normally, the defination of cost function includes regularization.Regularization can prevent overfitting.

The choosing of parameter of regularization should be suitable:

The defination of all functions:

The result:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章