使用BERT的兩層encoder實現tweet sentiment extraction

原創

2020-04-25 06:24

文章目錄

2. 模型構造

使用BERT的兩層encoder實現tweet sentiment extraction

Tweet sentiment extraction是kaggle的一個比賽，這個代碼主要是想嘗試利用BERT模型實現詞語抽取。
其比賽鏈接：https://www.kaggle.com/c/tweet-sentiment-extraction/

我在上一篇文章中初步實現了一個以bert爲基礎的模型，其文章爲：BERT in tweet_sentiment_extraction，發現這個實現效果不怎麼好，於是便想着要進一步改進模型。

這篇文章的具體代碼實現在：https://github.com/llq20133100095/tweet_sentiment_extraction/tree/two_layer_classification

比賽背景：
在日常的微博傳播背後，其情緒會影響公司或者個人的決策。捕捉情緒語言能夠立刻讓人們瞭解到語言中的情感，從而可以有效
指導決策。但是,哪些詞實際上主導情緒描述，這就需要我們模型能夠有效挖掘出來。

比如給定一個句子：“My ridiculous dog is amazing.” [sentiment: positive]。這個句子的情感爲positive(積極)，則比賽需要我們抽取出
能夠充分表達這個積極情感信息的詞語，比如句子中的“amazing”這個詞語可以表達positive情感。

1.前言

1.1 Required

bert-tensorflow
1.15 > tensorflow > 1.12 
tensorflow-hub

1.2 分析給定的數據

比賽中給定了兩個數據集：train.csv和test.csv。利用train.csv數據來構造模型，並預測test.csv數據。

train.csv的具體數據結構如下：

textID: 文本id
text：原始文本
selected_text：抽取出來的，帶有情感的文本
sentiment：句子的情感

2. 模型構造

2.1 數據清洗

模型輸入：是把“text”和“sentiment”進行拼接，構造成"[CLS] text [SEP] sentiment [SEP]"。

目前發現在數據集上，selected_text中沒有進行數據清洗，裏面有很多缺失的詞語。通常在開頭和結尾處，詞語顯示不完整。比如：

text: happy birthday
selected_text: y birthday

上面在開頭缺少了“happy”這個詞語，所以需要補上。

同時也存在兩個單詞沒有空格開，比如

text: birthday,say say
selected_text: say say

具體清洗代碼可以看：process_data.py

2.2 模型結構

前一篇文章中，是直接預測每個單詞是否需要抽取，這就需要同時構造多個分類器。觀望了一下原始數據集，發現抽取到的文本是連續的文本，那麼就可以直接標記起始位置(start_label)和結尾位置(end_label)，作爲預測label
這時候原始的N個分類器可以縮減到2個分類器。

本身BERT訓練的時候，encoder上共有12層layer。實驗中使用了最後的一層layer預測start_label,使用倒數第二層預測end_label，這樣就可以構造兩個分類器來進行預測。

模型如下所示：

其中a爲text，b爲sentiment。

具體代碼實現在train.py：

def create_model(bert_config, is_training, is_predicting, input_ids, input_mask, segment_ids,
                 target_start_idx, target_end_idx, num_labels, use_one_hot_embeddings):
    """Creates a classification model."""
    model = modeling.BertModel(
        config=bert_config,
        is_training=is_training,
        input_ids=input_ids,
        input_mask=input_mask,
        token_type_ids=segment_ids,
        use_one_hot_embeddings=use_one_hot_embeddings)

    # Use "pooled_output" for classification tasks on an entire sentence.
    # Use "sequence_output" for token-level output.
    # "get_all_encoder_layers" for all encoder layer
    all_layer = model.get_all_encoder_layers()  # output_layer: 12 layer * [N, max_len, 768]

    hidden_size = all_layer[-1].shape[-1].value
    max_len = all_layer[-1].shape[1].value

    # Create our own layer to tune for politeness data. shape:[N, max_length, num_labels]
    with tf.variable_scope("first_softmax_llq", reuse=tf.AUTO_REUSE):
        output_weights = tf.get_variable("output_weights", [num_labels, 2 * hidden_size],
                                         initializer=tf.truncated_normal_initializer(stddev=0.02))

        output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

    with tf.variable_scope("loss"):
        output_layer = tf.concat([all_layer[-1], all_layer[-2]], axis=-1)

        # Dropout helps prevent overfitting
        output_layer = tf.layers.dropout(output_layer, rate=0.1, training=is_training)

        # softmax operation
        logits = tf.einsum("nlh,hm->nlm", output_layer, tf.transpose(output_weights))
        logits = tf.nn.bias_add(logits, output_bias)
        # logits_probs = tf.nn.log_softmax(logits, axis=-1)
        start_logits_probs, end_logits_probs = tf.split(logits, 2, axis=-1)
        start_logits_probs = tf.squeeze(start_logits_probs, axis=-1)
        end_logits_probs = tf.squeeze(end_logits_probs, axis=-1)

        # Convert labels into one-hot encoding
        one_hot_start_idx = tf.one_hot(target_start_idx, depth=max_len, dtype=tf.float32)
        one_hot_end_idx = tf.one_hot(target_end_idx, depth=max_len, dtype=tf.float32)

        one_hot_start_labels = tf.one_hot(tf.argmax(start_logits_probs, axis=-1), depth=max_len, dtype=tf.int32, axis=-1)
        one_hot_end_labels = tf.one_hot(tf.argmax(end_logits_probs, axis=-1), depth=max_len, dtype=tf.int32, axis=-1)
        predicted_labels = one_hot_start_labels + one_hot_end_labels

        # If we're predicting, we want predicted labels and the probabiltiies.
        if is_predicting:
          return (predicted_labels, logits)

        # If we're train/eval, compute loss between predicted and actual label
        loss = tf.keras.backend.sparse_categorical_crossentropy(target_start_idx, start_logits_probs, from_logits=True)
        loss += tf.keras.backend.sparse_categorical_crossentropy(target_end_idx, end_logits_probs, from_logits=True)
        loss = tf.reduce_mean(loss)
        return (loss, predicted_labels, logits)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用BERT的兩層encoder實現tweet sentiment extraction

文章目錄

使用BERT的兩層encoder實現tweet sentiment extraction

1.前言

1.1 Required

1.2 分析給定的數據

2. 模型構造

2.1 數據清洗

2.2 模型結構

stacking in tensorflow2.0：Roberta集成

奇異值分解（SVD）推導（從條件推理+反向證明+與特徵分解的關係）

BLEU算法（例子和公式解釋）

機器學習——過擬合問題（線性迴歸+邏輯斯特迴歸的正則化推導）

啓動Tomcat出現自動關閉問題的解決辦法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結