怎樣讓深度學習模型更泛用?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"本文最初發佈於towards data science網站,經原作者授權由InfoQ中文站翻譯並分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不變風險最小化(Invariant Risk Minimization,IRM)是一種激動人心的新型學習範式,可幫助預測模型的泛化水平超越訓練數據的侷限。它由Facebook的研究人員開發,並在2020年的一篇"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":"論文"}]},{"type":"text","text":"中做了介紹。這種方法可以添加到幾乎任何建模框架中,但它最適合的是利用大量數據的黑盒模型(各種神經網絡及它們的變體)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文中,我們就來深入瞭解一番。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術總覽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在高層次上,IRM是一種學習範式,它試圖學習因果關係而不是相關關係。通過開發訓練環境和結構化數據樣本等手段,我們可以儘可能提高準確性,同時保證預測變量的不變性。既適合我們的數據,又在各種環境中保持不變的預測變量被用作最終模型的輸出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/b4\/48\/b40bdbaccea73c22693c3fda0fbe8548.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"圖1:4-foldCV(頂部)與不變風險最小化(IRM)(底部)的理論性能對比。這些值是從"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":"論文"}],"marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}},{"type":"strong"}],"text":"中的模擬推斷出來的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第1步:開發你的環境集"},{"type":"text","text":"。我們沒有重新整理數據並假設它們是IID,而是使用與數據選擇過程相關的知識來開發多種採樣環境。例如,對於一個解析圖像中文本的模型,我們的訓練環境可以按編寫文本的作者來分組。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第2步:最小化跨環境損失"},{"type":"text","text":"。開發環境之後,我們會擬合近似不變的預測變量並優化我們跨環境的準確性。更多信息請參閱後文。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第3步:更好地泛化"},{"type":"text","text":"!風險不變最小化方法表現出比傳統學習範式更高的分佈外(out-of-distribution,OOD)準確性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"到底發生了什麼事情?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先停一下,來了解風險不變最小化的實際工作機制。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"預測模型是做什麼的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,預測模型的目的是泛化,也就是在沒見過的數據上也獲得良好的表現。我們將沒見過的數據稱爲分佈外(OOD)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了模擬新數據,業界引入了多種方法(如"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/cross-validation-430d9a5fee22","title":"","type":null},"content":[{"type":"text","text":"交叉驗證"}]},{"type":"text","text":")。儘管這種方法比簡單的訓練集要好,但我們仍然受限於觀察到的數據。那麼,你能確保這個模型會泛化嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"嗯,一般來說你是不能的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於一些有着明確定義的問題來說(其中你對數據生成機制有着很好的理解),我們可以確信我們的數據樣本代表了總體。但對於大多數應用類型而言我們沒法這樣肯定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉一個論文中引用的例子。我們想要判斷一張圖裏的動物是牛還是駱駝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/91\/98\/91e363cdd27dd8fd1d0fca23bb5b3c98.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我們使用交叉驗證訓練一個二元分類器,並觀察到模型在我們的測試數據上獲得了很高的精度。很好!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,經過更多的探索,我們發現我們的分類器只是簡單地使用背景的顏色來判斷圖像是牛還是駱駝;當一頭奶牛被放置在沙色背景中時,模型總會認爲它是一頭駱駝,反之亦然。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,我們是否可以假設人們總是隻在牧場上觀察到奶牛,而只在沙漠中觀察到駱駝呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"顯然不行。雖然這是一個很小的例子,但我們可以看到類似的情況也會影響更復雜和更重要的模型。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲什麼目前的方法不夠用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在深入研究解決方案之前,我們先進一步瞭解爲什麼流行的訓練\/測試學習範式是不夠用的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經典的訓練\/測試範式在論文中被稱爲經驗風險最小化(Empirical Risk Minimization ,ERM)。在ERM中,我們將數據彙集到訓練\/測試集中,在所有特徵上訓練模型,使用測試集進行驗證,並返回具有最佳測試(樣本外)準確性的擬合模型。一個例子是50\/50的訓練測試拆分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在,爲了理解爲什麼ERM不能很好地泛化,我們來分別看一下它的三個主要假設:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們的數據是獨立同分布的(IID)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"隨着我們收集更多數據,樣本大小n與顯著特徵數量之間的比率應該會降低。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"只有存在具有完美訓練準確度的可實現(可構建)模型時,纔會出現完美的測試準確度。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"乍一看,這三個假設似乎都成立。但實際情況往往相反。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看看我們的第一個假設,我們的數據幾乎從來都不是真正的IID。在實踐中,收集數據時幾乎總是會引入數據點之間的關係。例如,沙漠中駱駝的所有圖像都必須在世界的某些地方拍攝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在有很多數據“非常”IID的情況,但重要的是,要批判性地思考你的數據收集是否以及如何引入偏見。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設#1:如果我們的數據不是IID,那麼第一個假設就失效了,我們不能隨機打亂我們的數據。重要的是要考慮你的數據生成機制是否會引入偏見。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於我們的第二個假設,如果我們是對因果關係建模,我們會期望顯著特徵的數量在一定數量的觀察之後保持基本穩定。換句話說,隨着我們收集更多高質量的數據,我們將能夠找出真正的因果關係並完美地映射它們,因此更多的數據不會提高我們的準確性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但對於ERM來說這種情況很少發生。由於我們無法確定某種關係是否是因果的,因此更多的數據通常會擬合出更多虛假的相關性。這種現象被稱爲"},{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/understanding-the-bias-variance-tradeoff-165e6942b229","title":"","type":null},"content":[{"type":"text","text":"偏見-方差權衡"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設#2:當使用ERM進行擬合時,顯著特徵的數量可能會隨着我們樣本量的增加而增長,從而讓我們的第二個假設無效。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們的第三個假設只是說明我們有能力構建一個“完美”的模型。如果我們缺乏數據或強大的建模技術,這個假設將無效。然而,除非我們知道這是做不到的,否則我們總是假設它是可行的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設#3:我們假設足夠大的數據集可以實現最優模型,因此假設#3成立。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文中也討論了一些非ERM方法,但由於各種原因,它們也存在不足。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"解決方案:不變風險最小化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文所提出的解決方案稱爲不變風險最小化(IRM),它克服了上面列出的所有問題。IRM是一種學習範式,可以從多個訓練環境中估計因果預測變量。而且,因爲我們是從不同的數據環境中學習的,我們更有可能泛化到新的OOD數據上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何做到這一點呢?我們利用了因果關係依賴於不變性的概念。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到我們的例子,我們看到的95%的圖像中,奶牛的背景是草地,而駱駝的背景是沙漠,所以如果我們擬合背景的顏色,將達到95%的準確率。從表面上看,這是一個非常合適的選項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,隨機對照試驗中有一個叫做反事實的核心概念,說的是如果我們看到了一個假設的反例,我們就可以推倒這個假設了。因此,只要我們在沙漠中看到了一頭奶牛,我們就可以得出結論,沙漠背景不會必然關聯駱駝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然嚴格的反事實有點苛刻,但我們可以嚴厲懲罰我們的模型在給定環境中預測錯誤的實例,從而將這個概念構建到我們的損失函數中。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,考慮一組環境,其中每個環境對應一個國家。假設9\/10的環境中奶牛生活在牧場,而駱駝生活在沙漠,但在第10類環境中這種模式反過來了。當我們在第10個環境中訓練並觀察到許多反例時,模型瞭解到單從背景不足以打出牛或駱駝的標籤,因此降低了這個預測變量的顯著性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們已經看明白了IRM的含義,現在我們進入數學世界,學習該如何實現它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/a8\/d2\/a87275abc5a93c5a81357e487c6f29d2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖2:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":"最小化表達式"}],"marks":[{"type":"size","attrs":{"size":10}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖2展示了我們的優化表達式。正如總和所示,我們希望在所有訓練環境中最小化總和值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進一步細分,“A”項代表我們在給定訓練環境中的預測準確性,其中phi(𝛷)代表數據變換,例如一個對數或核心變換到更高維度。R表示我們模型在給定環境e下的風險函數。請注意,風險函數只是損失函數的平均值。一個經典的例子是均方誤差(MSE)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“B”項只是一個正數,用於縮放我們的不變性項。還記得我們說過嚴格的反事實可能太苛刻了嗎?這裏我們可以衡量這種苛刻的程度。如果lambda(λ)爲0,我們就不關心不變性,只需優化準確性。如果λ很大,我們非常關心不變性並相應地給出懲罰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,“C”和“D”項代表我們的模型在訓練環境中的不變性。我們不需要深入研究這一術語,但簡而言之,我們的“C”項是線性分類器w的梯度向量,默認值爲1。“D”是該線性分類器的風險w乘以我們的數據轉換(𝛷)。整個項是梯度向量的平方距離。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1907.02893.pdf","title":"","type":null},"content":[{"type":"text","text":"論文"}]},{"type":"text","text":"詳細介紹了這些術語,如果你好奇,請查看第3部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總之,“A”是我們模型的準確性,“B”是衡量我們對不變性的關注程度的正數,“C”“D”是我們模型的不變性。如果我們最小化這個表達式,我們應該能找到一個模型,其只能擬合在我們的訓練環境中發現的因果效應。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"IRM後續發展"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不幸的是,本文介紹的IRM範式僅適用於線性情況。將我們的數據變換到高維空間可以獲得有效的線性模型,但一些關係從根本上就是非線性的。論文作者將非線性情況留給了將來的研究。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你想跟蹤這一研究,可以查看以下作者的成果:"},{"type":"link","attrs":{"href":"https:\/\/scholar.google.com\/citations?user=A6qfFPkAAAAJ&hl=en","title":"","type":null},"content":[{"type":"text","text":"Martin Arjovsky"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/leon.bottou.org\/papers","title":"","type":null},"content":[{"type":"text","text":"León Buttou"}]},{"type":"text","text":"、"},{"type":"link","attrs":{"href":"https:\/\/ishaan.io\/","title":"","type":null},"content":[{"type":"text","text":"Ishaan Gulrajani"}]},{"type":"text","text":"和"},{"type":"link","attrs":{"href":"https:\/\/scholar.google.com\/citations?hl=en&user=SiCHxTkAAAAJ&view_op=list_works&sortby=pubdate","title":"","type":null},"content":[{"type":"text","text":"David Lopez-Paz"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是我們的方法,還不錯吧?"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"實現注意事項"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏有一個PyTorch"},{"type":"link","attrs":{"href":"https:\/\/github.com\/facebookresearch\/InvariantRiskMinimization","title":"","type":null},"content":[{"type":"text","text":"包"}]},{"type":"text","text":"。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM最適合未知的因果關係。如果存在已知關係,你應該在模型結構中考慮它們。一個著名的例子是卷積神經網絡(CNN)的卷積。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM在無監督模型和強化學習方面具有很大的潛力。模型公平性也是一個有趣的應用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優化非常複雜,因爲有兩個最小化項。論文概述了一種使優化凸出的變換,但僅限於線性情況。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IRM對輕度模型錯誤定義具有穩健性,因爲它在訓練環境的協方差方面是可微的。因此,雖然“完美”模型是理想的,但最小化表達式對小的人爲錯誤具有彈性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/towardsdatascience.com\/how-to-make-deep-learning-models-to-generalize-better-3341a2c5400c","title":"","type":null},"content":[{"type":"text","text":"https:\/\/towardsdatascience.com\/how-to-make-deep-learning-models-to-generalize-better-3341a2c5400c"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章