一、Abstract
Unlike most existing representation models that either use no structure or rely on pre-specified structures, we propose a reinforcement learning (RL) method to learn sentence representation by discovering optimized structures automatically.
二、Introduction
-
Mainstream models
1、Bag-of-words representation models
2、Sequence representation models
3、Structured representation models (tree-structured LSTM)
4、Attention-based methods -
propose two structured representation models:
1、information distilled LSTM (ID-LSTM)
2、hierarchical structured LSTM (HS-LSTM)
三、Methodology
1、Overview
- Policy Network:
- Samples an action at each state
- Two models: Information Distilled LSTM, Hierarchically Structured LSTM- Structured Representation Model: transfer action sequence to representation
- Classification Network: provide reward signals
2、ID-LSTM
Target: Distill the most important words and remove irrelevant words
Action: {Retain, Delete}
Policy:
States:
Rewards: (delayed reward)
NB : L’ denotes the number of deleted wordsObjective Function: REINFORCE algorithm and policy gradient methods
3、HS-LSTM
- Target: Build a structured representation by discovering hierarchical structures in a sentence
Action: {Inside, End}
Indicating that a word is inside or at the end of a phrase (phrase means a substructure or segment)
Policy:
States:
Reward:(delayed reward)
NB : A unimodal function of the number of phrases (a good phrase structure should contain neither too many nor too few phrases)Objective Function: REINFORCE algorithm and policy gradient methods