第一章.Classification -- 03.Statistical Learning Theory for Supervised Learning翻譯

he key principle in statistical learning theory is the principle of Ockham’s razor.

Now Ockham’s razor is the idea that the best models are simple models that fit the

data well and it was named after the English friar and philosopher who said that among

hypothesis that predict equally well, we should choose the one with the fewest assumptions.

I’m sure he didn’t sort of have statistical learning theory in mind when he said that,

but that’s where that expression comes from.

Ok, so let’s start with a basic one dimension regression model.

This model I have on the screen there is not good; it’s really over-fitted.

So it’s just not going to generalize well to new points; it just can’t predict well.

Now let’s say that I have some way to measure model complexity,

and the more complex the models I have, the more they tend to over-fit and the simpler models,

they tend to under-fit. Now this plot is the key to understanding learning theory.

If I plot training error, which is this curve over here, then as the models grow more and more complex, the training error continues to decrease, because I can just over-fit more and more.

But at the same time, if I do that, the test error gets worse and worse.

If I, on the other hand, under-fit, then I won’t do well for either training or test.

Where I want to go, is this sweet spot in the middle here.

And the idea of this plot it holds true for classification, progression, whatever.

So the idea is that the best models are simple models that fit the data well.

So what we need is a balance between accuracy and simplicity.

Now, Ockham probably didn’t know optimization,

but this would have suited him fine, I’m guessing, if he were alive now.

So the most common machine learning methods,

they choose their function f, to minimize training error and model complexity,

which aims to thwart the cursive dimensionality.

So the cursive dimensionality is that we tend to over-fit when we have a lot of features and not as much data.

Data needs to increase exponentially with the number of features in order not to have this issue of over-fitting.

So we’re going to choose a model that’s both simple – low complexity – and has low training error;

and this exactly is the principle of Ockham’s razor.

And simplicity is measured in several different ways,

and is usually called regularization machine learning.

So this is the main foundation of machine learning;

it’s all about creating functions that minimize the loss, but also keep the model simple.

And this is the bottom line, folks;

we’re going to do this in many different ways throughout the course.

Different machine learning methods have different loss functions and they have different regularization terms.

And so, as we go on, you’ll see more and more inside these machine learning methods,

because I’ll tell you what all of these terms are.

他在統計學習理論中的關鍵原則是奧卡姆剃刀原理。

現在Ockham的剃刀是一個想法,最好的模型是適合的簡單模型。

數據很好,它是以英國修士和哲學家的名字命名的。

假設預測同樣好,我們應該選擇最少的假設。

我相信他在說到這一點時並沒有想到統計學的學習理論,

但這就是這個表達的來源。

好的,讓我們從一個基本的一維迴歸模型開始。

我在屏幕上的這個模型不太好;很合身的”。

所以它不會推廣到新的點;它只是不能很好地預測。

現在假設我有一種方法來測量模型的複雜性,

我擁有的模型越複雜,它們就越傾向於過度擬合和簡化模型,

他們傾向於under-fit。這個情節是理解學習理論的關鍵。

如果我畫的是訓練誤差,也就是這條曲線,隨着模型越來越複雜,訓練誤差會繼續減小,因爲我可以越來越多地適應。

但與此同時,如果我這樣做,測試誤差會越來越大。

另一方面,如果我不適合,那麼我就不會在訓練和測試中表現出色。

我想去的地方,就是中間的這個甜蜜點。

這個情節的概念適用於分類,進展,等等。

所以我們的想法是,最好的模型是簡單的模型,適合於數據。

所以我們需要的是在準確性和簡單性之間找到平衡。

Ockham可能不知道優化,

但如果他現在還活着,這對他很合適,我猜。

最常用的機器學習方法,

他們選擇函數f,最小化訓練誤差和模型複雜度,

它的目的是阻止曲線的維度。

所以曲線維度是當我們有很多特徵時我們傾向於過度擬合而不是大量的數據。

數據需要以指數形式增加,以避免出現過度擬合的問題。

所以我們要選擇一個既簡單又複雜的模型,而且訓練誤差很小;

這就是奧卡姆剃刀原理。

簡單是用幾種不同的方式來衡量的,

通常被稱爲正規化機器學習。

這是機器學習的主要基礎;

這一切都是爲了創建能夠最小化損失的函數,同時也保持模型的簡單性。

這是底線,夥計們;

在整個課程中,我們會用很多不同的方法來做這個。

不同的機器學習方法有不同的損失函數,它們有不同的正則化條件。

所以,當我們繼續,你會看到越來越多的機器學習方法,

因爲我要告訴你們所有這些項是什麼。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章