第一章.Classification -- 08.ROC Curve Algorithm翻譯

Let’s talk about another way to produce ROC curves.

So ROC curves can be produced in two ways: the one that I just showed you,

which is for a single, real valued classifier.

In that case, the ROC curve evaluates the classifier.

There’s another way to create an ROC curve,

where you can use a single algorithm and sweep the imbalance parameter across the full range and trace out an ROC curve to evaluate the algorithm.

So one is a property of a single classifier and the other is a property of a whole algorithm.

So let’s go over this way again and I’ll repeat how to do that.

So let’s say that we have our classifier – our function, f, and it’s increasing along this direction.

And we could place a decision boundary anywhere we wanted along here.

So let’s start from the top and sweep this thing down from top to bottom,

and every time we move it, you record the true positive rate and the false positive rate.

And then you plot all of those CPRs and FPRs on the scatter plot and that’s the ROC curve,

so let’s do it.

So we swing that whole thing that way and we record the true positive rate and false positive rate as we do it.

Okay, and that traces out this whole curve. So that’s the first way to create ROC curves,

and that’s for a single classification model.

And now let’s talk about how to evaluate an algorithm.

Do you remember the c parameter from the imbalanced learning section?

That’s the one that allows you to weight the positives differently than the negatives.

So here, I’m going to sweep through values of the c parameters and fit a machine learning model each time we adjust the c.

And we’re going to start with c being tiny.

So the classifier doesn’t care about the positives at all –

it just cares about getting the negatives right. And guess what?

It got all of them right because it just classified everything as negative.

Then we adjust c and we get this one.

And then we adjust c again and get that decision boundary and this one,

and that one. And as you do this sweep, you get different models each time,

but you also get a true positive rate and a false positive rate each time.

So you could plot those values on a ROC curve.

But this ROC curve evaluates the whole algorithm and not one algorithm.

Now what’s the advantage of using one of these methods over the other?

It’s not really like that – you can’t really say that.

One of these two things is an evaluation measure for a single function and the other is a measure of quality for a whole algorithm.

Usually an algorithm that’s optimized for a specific decision point can actually do better than an algorithm that is optimized for something else.

So usually, you would expect to see the ROC curve for the whole algorithm,

which is optimized at each point on this curve.

You’d expect that to do better than just the single classifier,

though every once in a while you do get a surprise and something weird happens.

But in any case, this is the idea.

If you use the algorithm and fiddle with the class weight parameter,

you’re essentially optimizing for each point along that curve.

Whereas, if you use a fixed classifier, you might be optimizing for one point on that curve,

or for something else entirely, so you wouldn’t expect the ROC curve to be as good.

Okay, so here ends the discussion of ways to produce ROC curves.

讓我們討論另一種生產ROC曲線的方法。

所以ROC曲線可以用兩種方式產生:我剛纔給你們看的那個,

這是一個單一的,實值的分類器。

在這種情況下,ROC曲線評估分類器。

還有一種創建ROC曲線的方法,

你可以使用一個算法,在整個範圍內掃描不平衡參數,並找出一個ROC曲線來評估算法。

一個是單個分類器的屬性另一個是整個算法的屬性。

我們再看一遍,我重複一下怎麼做。

假設有一個分類器,函數f,它沿着這個方向增長。

我們可以在任何我們想要的地方設置一個決策邊界。

讓我們從上面開始,從上到下,

每次移動的時候,你都記錄了正確率和假陽性率。

然後把所有的CPRs和FPRs都畫在散點圖上這是ROC曲線,

讓我們來做它。

所以我們用這種方法來改變整個事情我們記錄了真實的正確率和假陽性率。

好的,這就是曲線的軌跡。這是創建ROC曲線的第一種方法,

這是一個分類模型。

現在我們來討論一下如何計算一個算法。

你還記得不平衡學習部分的c參數嗎?

這是允許你以不同於消極的方式來衡量積極因素的方法。

在這裏,我將掃過c參數的值並在每次調整c時匹配機器學習模型。

我們從c很小開始。

所以分類器並不關心所有的優點。

它只關心負負的問題。你猜怎麼着?

所有這些都是正確的,因爲它把所有東西都歸爲負。

然後我們調整c,得到這個。

然後我們再調整c,得到這個決定邊界,

這一個。當你做這個掃描時,每次都得到不同的模型,

但是你也會得到一個真實的正速率和一個假的正速率。

你可以在ROC曲線上畫出這些值。

但是這個ROC曲線對整個算法進行了評估,而不是一個算法。

那麼使用這些方法中的一個的好處是什麼呢?

這不是真的,你不能這麼說。

這兩種方法中的一種是對單個函數的評估,另一種是對整個算法的質量度量。

通常,針對某個特定決策點進行優化的算法實際上比針對其他內容優化的算法做得更好。

通常情況下,你會看到整個算法的ROC曲線,

在這條曲線上的每一點上都進行了優化。

你會希望它比單一分類器做得更好,

雖然每隔一段時間你都會感到意外,但奇怪的事情發生了。

但無論如何,這是一個想法。

如果你使用這個算法來擺弄類權重參數,

你對曲線上的每一點都進行了優化。

然而,如果你使用固定的分類器,你可能會在曲線上的某一點進行優化,

或者完全是爲了別的東西,所以你不會期望ROC曲線會這麼好。

好的,這裏結束了關於生產ROC曲線的方法的討論。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章