【論文筆記】The neuro-symbolic concept learner: interpreting scenes, words, and sentences

原創

2020-06-17 07:22

The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision 論文筆記

a neural-based perception module:
scene understanding這裏的話關鍵是object-based understading 核心其實是有proposal generator 這裏是Mask R-CNN 還有object interpreter 這裏是ResNet-34 所以他的mask-rcnn找到proposals 然後將mask rcnn的proposal和原圖一起在ResNet-34中進行forward,然後得到的是不同的object feature. 一張圖有多少個object 就有多少個object feature. 這就是object-based representation.
然後通過不同的operators進行attribute，也就是ObjConcept還有RelationConcept的提取，這些會被用於program executor中。
a visually-grounded semantic parser
semantic parser這一塊主要是做的是首先用POS tagging找到natural language中所有的concept words，這是program的依據，然後通過encoder將文本轉換爲特定長度的feature之後，再用一些operationDecoder將其提取出預定義的類似 Filter Query Relate之類的opration, 最後遞歸迭代組合成爲一種類sql的語言，這就是最終的programs。
a symbolic program executor
將上述的從圖片中提取的concept以及從文本中提取的層級program進行執行，逐步得出答案。

裏面有一些細節：

relation concept用一個operator來進行提取，此時的輸入應該是兩個object representation 但如果把考慮位置關係的兩個物體的scene representation concate起來的話，就不需要這麼麻煩，位置信息在圖片中就已經包含了。
由於作者提到了curriculum learning，從簡到難的方式，原始的CLEVR數據集上的問題主要集中在Lesson3以及Deploy的級別上，而且每張圖片有10個問題，所以我覺得他應該是額外生成了一些簡單的問題。每張圖片額外生成20個問題。

需要訓練的參數主要集中在 neural-based perception （包括了conceptOperator（ShapeOf）relationOperator concept embedding）和semantic parser(Encoder和不同的conceptDecoder和 operationDecoder)模塊。
是用了強化學習的方式然後jointly training的
不過我看了會代碼我發現還是有一些SceneParsingLoss, ParserV1Loss, 會有add supervision的時候，使用extra annotations進行訓練的，其餘就是在不同answer類別的時候使用不同的loss進行訓練就可以了。

優點：
1.泛化性很強對於新的attribute的組合，新的attribute，新的concept，新的更復雜的圖像以及任務遷移都做了很好的實驗
2.可解釋性很強，體現在Figure 4 B中的case study中，層級的解析
3.只需要很小的數據集就可以有非常不錯的效果僅用10%的數據訓練後的效果是遠好於同級別的模型的
4.不需要額外的label，只需要數據集上的natural supervision就可以了
5.用operator處理attribute, 用executor處理program，將這些信息粒度化，這樣大大增強了可解釋性和可組合性。

缺點：

我感覺不同任務的數據集上，perception module, semantic parser中的operators數量以及concepts數量等等需要重新定義，額外增加的工作量主要是在curriculum learning中，但是原始數據集的question是程序自動生成的所以這裏的問題我估計也是程序生成的，所以也不能說算是缺點吧…

區別點	NS-VQA	NS-CL
label	full annotations & extra labels & predefined programs / programs	no annotations & no labels zero programs annotations
訓練樣本	1,2,3 sampling from question family	curriculum learning(basic question generated self)
訓練方式	seperatively training & fine tune	jointly training (answers and results from executors training through reinforcement learning method)

Future Work:

1.沿着NS-CL方向的在更普適的VQA數據集上做
2.將自然語言轉化爲形式化語言natural language -> query filter等等的
3.自動選取concept word.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【論文筆記】The neuro-symbolic concept learner: interpreting scenes, words, and sentences

The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision 論文筆記

Future Work:

【論文筆記】Auto-Encoding Variational Bayes

【論文筆記】Deep Metric Learning via Facility Location

【論文筆記】Joint Unsupervised Learning of Deep Representations and Image Clusters

【論文筆記】On How to Perform a Gold Standard Based Evaluation of Ontology Learning

【Python3】深層結構中的值刪除問題/ python列表刪除值出錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結