吴恩达机器学习之评估,判断

1.When it's going on a training, the data should be divided into three parts:

training data(训练集),cross validation data(交叉验证集),and test data(测试集)

There will be three types error:

The three error can be very vital when people suppose to decide what degree of polynominal(多项式次数) to fit to a data set. We should use cross validation data and test data at the same time.

 

2.If a learning algorithm dosen't do as well as people are hoping, almost al the time it will be because it has either a high bias problem or a high variance problem.

So it's about underfitting problem or overfitting problem.

Here is a way to adjudge the learning algorithm is underfitting or overfitting:

increase the degree of polynominal and watch the change of both test error and cross validation error.

So, if the training error and cv error decrease at the same time, it is the underfitting problem.

Or, if the cv error far more large than train error, it is the overfitting problem.

train data作为测试集,是最直观,或者说贴合的反应数据的准确程度。随着多项式的增多,函数愈发复杂,在多次尝试下,learning algorithm逐渐趋向与完全贴合测试集,最后看上去似乎是一点失误都没有。而实际上,其实已经过拟合了。

但是交叉验证集就很粗暴了,这组数据同反复锤炼的learning algorithm贴合度不高。所以当遇到欠拟合问题时,error很高;遇到过拟合问题时,error很高。单独比较cv error曲线是不明显的,拿它和train error比较,结果就很明显了。

下面的选定合适的正则化参数也是用了相似的办法。

 

3.choosing regularization parameter

normally, the defination of cost function includes regularization.Regularization can prevent overfitting.

The choosing of parameter of regularization should be suitable:

The defination of all functions:

The result:

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章