怎樣發現機器學習模型中的缺陷?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每次你用匯總統計來簡化數據時都肯定會丟失信息。模型精度也不例外。如果你將模型擬合簡化爲一份彙總統計數據,就沒辦法再確定性能最低\/最高的位置和原因了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/e7\/1d\/e79f46be5af9c231615c091a2198a11d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1:模型性能較低的數據區域示例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這個問題,IBM的研究人員最近開發了一種稱爲"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2108.05620.pdf","title":"","type":null},"content":[{"type":"text","text":"FreaAI"}]},{"type":"text","text":"的方法,可以給出模型精度較低位置的可解釋數據切片。根據這些切片提供的信息,工程師可以採取必要的步驟來確保模型按預期運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不幸的是,FreaAI不是開源的,但它採用的許多理念都可以在你喜歡的技術棧中輕鬆實現。下面我們就來深入瞭解一番。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FreaAI能夠在測試數據中發現統計意義上性能顯著低下的切片,然後將它們返回給工程師進行檢查。方法步驟如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用最高先驗"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"}],"text":"後驗"},{"type":"text","marks":[{"type":"strong"}],"text":"密度(HPD)方法以低精度查找單變量數據切片。這些單變量數據切片減少了搜索空間,並能揭示出我們的數據更可能出現問題的位置。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用決策樹以低精度查找雙變量數據切片。這些雙變量數據切片減少了分類預測變量和二階交互的搜索空間,以揭示我們的數據更可能出現問題的位置。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"刪除所有不符合特定啓發式算法(heuristic)的數據切片。"},{"type":"text","text":"只留下對測試集有最小支持的切片,它們的統計錯誤率顯著升高。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"這到底說的是什麼意思?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一堆術語太難懂了,所以我們放慢一點,看看到底發生了什麼事情......"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"問題"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開發模型時,我們經常使用“精度”指標來確定擬合。一個例子是均方誤差,它用於線性迴歸,定義如圖2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/54\/6c\/5480a5def1459bc4ab77cfc709756a6c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2:均方誤差公式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是這個平均誤差只告訴了我們"},{"type":"text","marks":[{"type":"strong"}],"text":"平均"},{"type":"text","text":"表現是什麼樣的。我們不知道我們我們是不是在數據的某些部分表現很好,在其他一些部分表現很差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是預測建模中一個長期存在的問題,最近引起了很多關注。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"解決方案"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一種解決方案就是FreaAI。該方法由IBM開發,旨在找出我們的模型在數據中的哪些部分表現不佳。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它分爲兩大步驟。第一步是創建數據切片,第二步是確定模型在這些數據切片中是否表現不佳。FreaAI的輸出是我們的數據中模型性能較低的一組“位置”。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 數據切片"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"組合測試(CT)是一個框架,它按順序查看所有預測變量組,以發現性能不佳的區域。例如,如果我們有兩個分類預測變量——顏色和形狀,我們會查看所有可能的組合,看看精度下降的是哪些地方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,想要在大型數據集上利用組合測試在計算上是不可能做到的——隨着列數越來越多,我們所需的組合數量會呈指數增長。因此,我們需要定義一種方法來幫助我們搜索特徵以找到潛在的低精度區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ca\/56\/ca5a5c58e3683efdb5e06d1b63fa2d56.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3:50%最高密度區域(HDR)的示例,用藍色表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FreaAI中利用的第一種方法使用稱爲"},{"type":"link","attrs":{"href":"https:\/\/stats.stackexchange.com\/questions\/148439\/what-is-a-highest-density-region-hdr","title":"","type":null},"content":[{"type":"text","text":"最高密度區域"}]},{"type":"text","text":"(HDR)(圖3)。簡而言之,HDR會尋找滿足一個數字特徵的數據達到一定比例的最小區域,即高密度區域。在圖3中,該區域由水平藍色虛線區分——我們50%的數據位於該線上方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後我們迭代地將這個範圍減小一個ε值(默認爲0.05),並尋找精度增加的情況。"},{"type":"text","marks":[{"type":"strong"}],"text":"如果在給定迭代中精度確實增加了,我們就知道模型在先前迭代和當前迭代之間的區域中表現是不佳的"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了找出不能很好滿足數值預測變量的區域,我們會對測試集中的所有預測變量迭代運行這個HDR方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很酷,對吧?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二種方法利用決策樹來處理所有非數字預測變量以及兩個特徵的組合。簡而言之,我們擬合了一個決策樹,並尋找這些特徵的哪些分割最小化了精度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/34\/e2\/34902dde19832yy79dfb462157ba1ae2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4:關於連續單變量預測變量“年齡”的決策樹示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在圖4中,每個決策節點(藍色)是我們特徵的一個分割(spilt),每個末端節點(數字)是該分割的精度。通過擬合這些樹,我們可以真正減少搜索空間並更快地找到性能不佳的區域。"},{"type":"text","marks":[{"type":"strong"}],"text":"此外,由於樹對於許多類型的數據都非常穩健,我們可以在分類預測變量或多個預測變量上運行它以捕獲交互效應(interaction effects)"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於所有特徵組合以及非數字的單個特徵都會重複這種決策樹方法。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 數據切片的啓發式算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到目前爲止,我們只是在使用精度來開發數據切片,但還有其他啓發式算法可以幫助我們找到"},{"type":"text","marks":[{"type":"strong"}],"text":"有用的"},{"type":"text","text":"數據切片:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"統計顯著性"},{"type":"text","text":":爲了確保我們只查看準確率顯著下降的數據切片,我們只保留性能比誤差置信區間下限低4%的切片。這樣,我們就能以概率α聲明我們的數據切片具有更高的錯誤率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"可解釋性"},{"type":"text","text":":我們還希望對發現的問題區域採取行動,因此我們在創建組合時只查看兩三個特徵。將交互限制到較低階後,我們的工程師就更有可能開發出解決方案。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"最小支持"},{"type":"text","text":":最後,數據切片必須有足夠的錯誤率,值得我們去研究。我們要求必須至少有2個錯誤分類,或者它必須覆蓋5%的測試錯誤——以較大值爲準。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值得注意的是,你可以根據業務需求定製其他啓發式算法,"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/accuracy-precision-recall-or-f1-331fb37c5cb9","title":"","type":null},"content":[{"type":"text","text":"精度\/召回權衡"}]},{"type":"text","text":"就是一個例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"總結和要點"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FreaAI的大致原理就講這麼多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再次提醒,FreaAI不是開源的,但將來大概會向公衆發佈。同時,你可以將我們討論過的框架應用於你自己的預測模型,找出模型性能不足的位置。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 概括"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回顧一下,FreeAI使用HDR和決策樹來減少我們預測變量的搜索空間。然後它會反覆查看單個特徵和組合,以找出性能低下的位置。針對這些低性能區域還會用上一些啓發式算法,可確保發現是可操作的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 這個方法的意義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先這個框架可以幫助工程師識別模型的缺陷所在,並(希望)可以糾正它們,從而改進模型的預測能力。這種收益對於黑盒模型(例如神經網絡)來說尤其有吸引力,因爲這種模型中沒有能用的係數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"通過隔離表現不佳的數據區域,我們得到了一個窺探黑匣子的窗口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FreaAI還有很多有趣的潛在應用場景。一個例子是識別模型漂移,當經過訓練的模型隨着時間的推移變得效果越來越差時就會發生這種情況。IBM剛剛發佈了一個用於確定模型漂移的假設檢驗"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2108.05319.pdf","title":"","type":null},"content":[{"type":"text","text":"框架"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個有趣的應用是確定模型偏見。在這種情況下,偏見是不公平的概念,例如根據某人的性別拒絕向某人提供貸款。通過查看模型性能較低的數據分割,你可以發現存在偏見的區域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-find-weaknesses-in-your-machine-learning-models-ae8bd18880a3","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/how-to-find-weaknesses-in-your-machine-learning-models-ae8bd18880a3"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章