【2017.11.15】Generative Adversarial Active Learning

學習筆記：GAN + Active learning

主要步驟：

GANs to synthesize informative training instances that are adapted
to the current learner.
Human oracles to label these instances.
The labeled data is added
back to the training set to update the learner.
This protocol is executed iteratively until the label
budget is reached.

主要貢獻：

This is the first active learning framework using deep genera-
tive models.
With enough capacity from the trained generator, our
method allows us to have control over the generated instances which may not be available
to the previous active learners.
The results are promising, compare with self-taught learning.
This is the first work to report numerical results in active learning synthesis for image
classification.
We show that our approach can perform
competitively when compared against pool-based methods.

主動學習類型（scenario）：

基於 stream（不是很懂）：stream-based active learning
基於未標註樣本池：pool-based active learning
樣本生成式（例如GAN之類的生成模型）：learning by query synthesis active learning

核心部分：

GAN：使用DCGAN（TensorFlow，https://github.com/carpedm20/DCGAN-tensorflow）
generative models：用於生成訓練樣本（To directly synthesize training data）。而且 generative models 的訓練數據集是未標註的樣本集，因此生成的樣本也是未知標籤的（如果訓練的數據集是帶標註的，那麼生成的樣本也是帶標註的，參考hy的GAN）。值得注意的是，訓練 generative models 時，可以對loss function加一些限制條件控制生成的樣本（類似於主動選擇中的策略，比如，增加生成樣本之間的diversity、每個樣本的entropy）。
We adaptively generate new queries by solving an optimization problem.
作者反覆強調：本文是第一個將 GAN 和 active learning 進行結合，並且提供了 numerical results。此外，本文的主要目的是基於 GAN 和 active learning 提出一種 framework，並不是重在提升模型性能（如，分類問題的準確率等）。
訓練數據的初始化：隨機篩選50個樣本進行初始化；每次1個batch對應10次queries。

作者在文中留下的展望：

Alternatively, we can incorporate diversity into our active learning principle. 將一些主動篩選的策略加入到目標函數.
GAN的改進：We also plan to investigate the possibility of using
Wasserstein GAN in our framework.

值得繼續深入的論文：related work 部分

早期的“generative models +主動學習”

《Kevin J. Lang and Eric B Baum.** Query Learning Can Work Poorly when a Human Oracle is
Used**, 1992.》1992年，作者使用“synthesized learning queries and used human oracles”訓練一個神經網絡，解決手寫字分類問題。但效果並不可觀，因爲當時生成的部分圖像連human oracles都無法辨識（如下圖所示）。
《Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-
supervised knowledge transfer for deep learning from private training datanumerical results**。

generative models +半監督學習（semi-supervised learning）

《Kamal Nigam, Andrew Kachites Mccallum, Sebastian Thrun, and Tom Mitchell. Text Classi-
fication from Labeled and Unlabeled Documents using EM. Mach. Learn., 39:103–134,2000.》2000年，Kamal等人提出一種基於generative models的半監督方法，解決文本分類問題。
《TimothyM.Hospedales,ShaogangGong,andTaoXiang. Findingrareclasses: Activelearning
with generative and discriminative models. IEEE Trans. Knowl. Data Eng., 25(2):374–386,
2013.》2013年，TimothyM等人將高斯混合模型引入主動學習中，generative models 作爲一個分類器。
《Jost Tobias Springenberg. Unsupervised and Semi-supervised Learning with Categorical Gen-
erative Adversarial Networks. arXiv, (2009):1–20,2015.》2015年，GAN + semi-supervised learning。本文作者強調了 active learning 和 semi-superviesd learning 的不同。

其他：

self-taught learning algorithm：《 Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught
Learning : Transfer Learning from Unlabeled Data. Proc. 24th Int. Conf. Mach. Learn., pages
759–766, 2007》
**diversity strategy：**Multi-Class
Active Learning by Uncertainty Sampling with Diversity Maximization.Semi-Supervised SVM Batch
Mode Active Learning with Applications to Image Retrieval. ACM Trans. Informations Syst.
ACM Trans. Inf. Syst. Publ. ACM Trans. Inf. Syst., 27(16):24–26,2009.》

實驗部分：

本文方法：DCGAN + active learning + linear SVM classifier（γ = 0.001）；
任務類型：圖像的二分類任務；作者提到，後續可以擴展到多任務問題、語言類型問題；

主要對比實驗：與其他主動學習方法做對比

使用普通的 GAN 生成訓練數據（regular GAN，simple GAN）；
SVM active learning；
Random select from unlabeled pool；
self-taught learning；

實驗的數據集：

MNIST+：只取數字 5 和 7（由於MNIST劃分的test集和train集屬於同一種分佈，基於pool-based的主動學習具有處理這類數據集的天然優勢；因此引入了與MNIST類似的數據集USPS，並作爲test集）；
CIFAR-10：只取 automobile 和 horse（爲了防止貓和狗這兩種類別比較模棱兩可，因此選汽車和馬）；

Cifar-10 數據集的實驗中，由於緯度相對MNIST較高，GAN較難訓練，產生的部分樣本質量較差。作者只挑選出質量較好的樣本使用，丟棄掉質量不好的樣本。此外，作者也有提到一些可以改進GAN的方法，但作爲未來工作。

本文方法 vs. self-taught learning

Balancing exploitation and exploration

作者嘗試結合 GAAL+Random sampling，並在手寫體數據集上做實驗（5-7，MNIST for training，USPS for test）。實驗結果表明：混合方法的效果優於單獨使用的方法。

A mixed scheme is able to achieve better performance than either using GAAL or random sampling alone. Therefore, it implies that GAAL, as an exploitation scheme, performs even better in combination with an exploration scheme. A detailed analysis such mixed schemes will be an interesting future topic.

思考：

GAN：

訓練GAN的數據集：

帶標籤的數據集：GAN生成的圖像對應的標籤也是確定的。
不帶標籤的數據集：GAN生成的圖像對應的標籤未知。

生成樣本的方式：訓練完GAN模型後，一次性生成N張所需的樣本？online learning形式，每次不斷生成新樣本？

後續的改進文章

《ADVERSARIAL SAMPLING FOR ACTIVE LEARNING，2019》後續會對此文章進行精度並詳細地整理。

【Active Learning - 04】Generative Adversarial Active Learning

【2017.11.15】Generative Adversarial Active Learning

相關文檔：

學習筆記：GAN + Active learning

主要步驟：

主要貢獻：

主動學習類型（scenario）：

核心部分：

作者在文中留下的展望：

值得繼續深入的論文：related work 部分

早期的“generative models +主動學習”

generative models +半監督學習（semi-supervised learning）

其他：

實驗部分：

主要對比實驗：與其他主動學習方法做對比

實驗的數據集：

本文方法 vs. self-taught learning

Balancing exploitation and exploration

思考：

GAN：

後續的改進文章

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

【Django based Turing Online System】20191024 程序猿節之"源碼分享", Talk is cheap, Show me the code

【Active Learning - 07】面向圖像分類任務的主動學習系統（實踐篇 - 展示）

【Active Learning - 06】面向圖像分類任務的主動學習系統（理論篇）

【Active Learning - 04】Generative Adversarial Active Learning

【Active Learning - 03】Adaptive Active Learning for Image Classification

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結