定義來自Sentiment Analysis and Opinion Mining 2.1節
Definition (Opinion): An opinion is a quadruple,
(g, s, h, t),
where g is the opinion (or sentiment) target, can be any entity or aspect of the entity s is the sentiment about the target, h is the opinion holder or opinion source and t is the time when the opinion was expressed.
Definition (entity): An entity e is a product, service, topic, issue, person, organization, or event. It is described with a pair, e: (T, W),where T is a hierarchy of parts, sub-parts and so on, and W is a set of attributes of e. Each part or sub-part also has its own set of attributes.
we simplify the hierarchy to two levels and use the term aspects to denote both parts and attributes
. In the simplified tree, the root node is still the entity itself, but the second level(also the leaf level) nodes are different aspects of the entity.
example: from http://alt.qcri.org/semeval2015/task12/
(1) It fires up in the morning in less than 30 seconds and I have never had any issues with it freezing. → {LAPTOP#OPERATION_PERFORMANCE}
(2) Sometimes you will be moving your finger and the pointer will not even move. → {MOUSE#OPERATION_PERFORMANCE}
包含了entity的抽取,聚類,ranking等問題
抽取可用的方法:
1、基於規則的抽取,可以根據情感詞和entity之間的關係來抽取
2、基於sequence模型
3、基於主題模型。
用stanford parser分析依存關係,然後設計語法規則,抽取修飾aspect(在訓練集合中已經標記出來了)的表達式,然後通過SVM來訓練。使用的feature如下:
1. POS,詞的詞性
2.上面提到的語法關係
3.情感詞的極性。建立了情感詞詞典,包括sentiWordNet,MPQA,eBLR(由於情感詞的極性有些是領域相關的,所以採用corps based方法:如果一個詞在訓練集合中只出有positive且頻率超過一定值,就把他加入positive列表,negative列表也是如此建立,對於即有positive也有negtive的情況,則如果P比N的頻率高則認爲是P)