Multi-Task Learning的幾篇綜述文章

點擊上方，選擇星標或置頂，每天給你送乾貨！

閱讀大概需要9分鐘
跟隨小博主，每天進步一丟丟

來自 | 知乎

地址 | https://zhuanlan.zhihu.com/p/145706170

作者 | 黃浴

編輯 | 機器學習算法與自然語言處理公衆號

本文僅作學術分享，若侵權，請聯繫後臺刪文處理

下面分別介紹多任務學習（MTL）的三篇綜述文章。

Ruder S, "An Overview of Multi-Task Learning in Deep Neural Networks", arXiv 1706.05098, June 2017

深度學習方面MTL總結：

按照隱層，MTL基本分兩類：Hard sharing和Soft sharing

Hard sharing在多任務之間共享隱層，降低over fitting的風險。“The more tasks we are learning simultaneously, the more our model has to find a representation that captures all of the tasks and the less is our chance of overfitting on our original task”

Soft sharing各任務之間有自己的模型和參數，主要靠regularization鼓勵任務之間的模型參數相似。

MTL的機制有幾點：

Implicit data augmentation 數據增強
Attention focusing 注意
Eavesdropping 竊聽
Representation bias 表示偏向
Regularization 正則化

非神經網絡模型中的MTL，主要有兩種：

Block-sparse regularization：enforcing sparsity across tasks through norm regularization
Learning task relationships：modelling the relationships between tasks

深度學習模型中的MTL:

Deep Relationship Networks

Fully-Adaptive Feature Sharing

Cross-stitch Networks

Low supervision

deep bi-directional RNNs [Søgaard and Goldberg, 2016]

A Joint Many-Task Model

Weighting losses with uncertainty

Tensor factorization for MTL (注：單任務學習STL）

[Yang and Hospedales, 2017a]

Sluice Networks

尋找輔助任務的方法：

Related task
Adversarial
Hints
Focusing attention
Quantization smoothing
Predicting inputs
Using the future to predict the present
Representation learning

Zhang Y, Yang Q, "An overview of multi-task learning", arXiv 1707.08114, July 2018

MTL方法分成幾類：

feature learning approach 特徵學習
low-rank approach 低秩參數
task clustering approach 任務聚類參數
task relation learning approach 任務關係學習參數
decomposition approach 分解參數

和其他機器學習方法結合：

semi-supervised learning
active learning
unsupervised learning
reinforcement learning
multi-view learning
graphical models

‘What to share’

feature：特徵
instance：實例（很少）
parameter：參數

MTL方法比較：

· 特徵學習方法學習通用特徵，轉移到所有現有任務甚至新任務。當存在與其他任務無關的異常任務時，會嚴重影響學習的功能，並且會導致性能下降，從而導致魯棒性不強。
· 通過假設參數矩陣是低秩的，低秩方法可以顯式學習參數矩陣的子空間，或者通過一些凸或非凸正則化器隱式實現該子空間。這種方法功能強大，但似乎僅適用於線性模型，非線性擴展的設計不容易。
· 任務聚類方法根據模型參數執行聚類，並且可以識別每個包含相似的任務的類。任務聚類方法的主要侷限性是，捕獲同一類任務之間的正相關，而忽略不同類之間的負相關。而且，即使該類某些方法可以自動確定聚類數，但大多數方法仍需要諸如交叉驗證之類的模型選擇方法來確定，帶來更多的計算成本。
· 任務關係學習方法可以同時學習模型參數和任務對的關係。所學的任務關係可以對任務的關係有深刻了解，可以提高解釋性。
· 通過多級參數，可以將分解方法視爲其他參數方法的擴展，因此分解方法可以對更復雜的任務結構（即樹結構）建模。分解方法的組件數對性能很重要。

正則化方法是MTL的主要方法。正則化MTL算法分爲兩類：特徵協方差學習和任務關係學習。特徵協方差學習可以看作是特徵MTL的一種典型表述，而任務關係學習則是基於參數的MTL。

MTL擴展方法：（任務聚類方法和任務關係學習方法）

· 將每個任務的多類別分類問題轉換爲二進制分類問題。
· 利用學習的特徵。
· 直接學習不同任務標籤的對應關係。
· 所有任務的模型參數構成一個張量，其中每個任務的模型參數形成一個切片，然後採用正則化或者分解方法。

Thung K, Wee C, "A Brief Review on Multi-Task Learning", Multimedia Tools and Applications, August 2018.

Rich Caruana 給出的MTL定義：“MTL is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared low dimensional representation; what is learned for each task can help other tasks be learned better”.

基於輸入/輸出，MTL 分爲三種類型：

· multi-input single-output (MISO)
· single-input multioutput (SIMO)
· multi-input multi-output (MIMO)

按照正則化方法，MTL分類：

· LASSO
· group sparsity
· low rank
· task exclusiveness （unrelated tasks）
· graph Laplacian regularization
· decomposition

incomplete data MTL處理方法：

· use only samples with complete data for MTL study, with the cost of reduced statistical power of analysis due to smaller dataset；
· impute the missing data before performing the MTL study, where the imputation is very much prone to error for data missing in blocks
· design a MTL method that is applicable to incomplete data.

深度學習的MTL方法：

Vandenhende S et al., "Revisiting Multi-Task Learning in the Deep Learning Era", arXiv 2004.13379, 2020

很新的綜述，剛剛看到。

深度學習的MTL主要是網絡模型的設計能夠從多任務監督信號中學會表徵共享。MTL的優點主要是：1）由於層共享，減少了內存佔用量。2）由於避免重複計算共享層特徵，提高推理速度。3）如果相關任務共享補充信息或作爲彼此的regularizer，則可以提高模型性能。比如計算機視覺中的檢測和分類，檢測和分割，分割和深度估計等等。

不過，如果任務字典裏面包括不相干的任務，MTL的聯合學習會帶來negative transfer。爲此不少方法是想尋找一個MTL的平衡點，比如Uncertainty Weighting、Gradient normalization、Dynamic Weight Averaging (DWA) 、Dynamic task prioritization、multiple gradient descent algorithm (MGDA) 和adversarial training等。另外一些最近的工作採用MTL得到一個初始預測，然後以此改進其特徵得到更好的輸出，比如PAD-NET、PAP-NET、JTRL和MTI-Net等。

如圖是文章對深度學習MTL的分類。MTL結構上分成編碼器和解碼器兩種，優化策略上分成任務平衡或者其他。

如圖就是PAD-Net，解碼器類的MTL。

還有這個PAP-NET（ Pattern-Affinitive Propagation Networks）。

這是Joint Task-Recursive Learning (JTRL) 。

以及Multi-Scale Task Interaction Networks (MTI-Net) 。這些都是解碼器類。

如表是任務平衡的方法比較：平衡幅度、平衡學習、梯度需要、非競爭梯度、非額外調節和動機等。

不同一般的是，該綜述做了一些實驗進行比較：

這三個表是編碼器結構類。

這四個表是解碼器結構類。

結論是解碼器類的MTL方法佔優。當然，編碼器對錶徵的貢獻還是不能忽略的。

這是解碼器和編碼器的比較。

這是三個數據集上的損失平衡法結果比較。

另外還給出了在數據集CelebA 上目前分類方法的比較，其中ResNet18加均勻權重的方法性能不錯。

添加個人微信，備註：暱稱-學校（公司）-方向，即可獲得1. 快速學習深度學習五件套資料2. 進入高手如雲DL&NLP交流羣記得備註呦

Multi-Task Learning的幾篇綜述文章

Ruder S, "An Overview of Multi-Task Learning in Deep Neural Networks", arXiv 1706.05098, June 2017

Zhang Y, Yang Q, "An overview of multi-task learning", arXiv 1707.08114, July 2018

Vandenhende S et al., "Revisiting Multi-Task Learning in the Deep Learning Era", arXiv 2004.13379, 2020

人大副教授從細節上教你如何快速進行研究生早期的科研之路

中文情感分析 (Sentiment Analysis) 的難點在哪？現在做得比較好的有哪幾家？

Multi-Task Learning的幾篇綜述文章

AdaX：一個比Adam更優秀，帶”長期記憶“的優化器

【科研】論文得分低、瀕臨被拒不要慌，18條rebuttal小貼士助你說服評審和AC

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結