CS 594 Automated image captioning and image-text alignment課程筆記

原創

2020-06-23 22:25

本文是UIC CS 594中關於image caption環節的PPT筆記，由Natalie Parde教授講授。

總的來說，PPT內容呈現的是比較泛的內容，信息量或許還不如一篇review來得多，但是適合入門的人。同時，CS 594是2019年春季課程，你會發現一部分內容講的是基於18年（包括18年）之前的內容。隨着2019年穀歌和微軟的跨模態預訓練模型的提出，ppt中提到的一些問題已經得到了很大的改善，也有許多人嘗試提出解決方案。

image caption涉及的兩個任務

image understanding
- object localization：what、where in the image
- attribute identification
- scene classification： where in real world
- entity relation
natrual language generation
- content selection：image的哪些方面需要被討論
- content organization：如何最有效地討論前面得到的元素
- surface realization：用什麼詞來描述這些元素？是否需要處理代詞問題？使用什麼張量？怎麼聚合信息？

怎麼處理Addtional layers

對於處理更復雜的caption，模型需要額外的層，同時也會帶來額外的複雜度。額外的層所起的作用，比如：

不同的受衆希望得到不同風格的caption
對於特定領域的圖像，模型還需要對應的上下文或者常識知識

模型分類

總體上，image caption模型分成兩類：

direct generation

首先定義所需的內容（比如key components、attributes等）並基於這些組件來設計對圖像的描述。
retrieval-based

在已有caption的圖像庫中找到相似的樣本，並根據檢索的樣本的caption，來幫助新的圖像生成caption

另外，在bottom-up attention中按訓練方式不同，分兩種image captioning模型：

self-critical類：保證訓練loss和評測使用的指標一致，比如ROUGH等，是不可微分的loss，所以迫使使用強化學習方法訓練【推薦這篇博文總結地挺清晰的】

entropy類：word-by-word生成過程中，一般使用熵作爲損失，可微分。

評測image captioning的方式

human evaluation指標
- Grammaticality
- Relevance
- Creativity
- Humanness
automatic evaluation指標
- BLEU
- ROUGE
- Translation Error Rate
- METEOR
- CIDEr
前面四項是NLP中常見的基礎指標。重點關注CIDEr一般是在image captioning中專門使用的指標，原因之一是caption是有多樣性的（這也是caption難以被準備測評的原因），所以使用TF-IDF作爲加權
這些指標依然不能保證所有情況下的caption質量，所以需要考慮：
- 如何衡量、促進caption的多樣性和創造性（受限於vocabulary，抑制OOV的同時限制了創造性）
- 如何測量時間序列中出現的圖像的上下文描述？

Resources

datasets

COCO: http://cocodataset.org
google 2018 Conceptual Captions: https://ai.googleblog.com/2018/09/conceptual-captions-new-dataset-and.html

做caption的同學知道還有Flickr30k、SBU datasets等早期一些中大型的數據集和Visual Genome等相關image understanding的權威數據集。

lectures（PPT中列的是比較早的內容）

Automated Image Captioning with ConvNets and Recurrent Nets, by Andrej Karpathy: https://youtu.be/xKt21ucdBY0
How we teach computers to understand pictures, by Fei Fei Li: https://youtu.be/40riCqvRoMs

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

CS 594 Automated image captioning and image-text alignment課程筆記

image caption涉及的兩個任務

怎麼處理Addtional layers

模型分類

評測image captioning的方式

Resources

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

關於接口協議，你必須要知道這些！

CS 594 Automated image captioning and image-text alignment課程筆記

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結