基于Transformers库的BERT模型：一个文本情感分类的实例解析

简介

本文来讲述BERT应用的一个例子，采用预训练好的BERT模型来进行演示。BERT的库来源于Transformers，这是一个由PyTorch编写的库，其集成了多个NLP领域SOTA的模型，比如bert、gpt-2、transformer xl等，并且可以自由选择已经预训练好的模型参数，我们可以基于这些参数进行进一步的训练调试。

Part 1: 利用BERT基于特征的方式进行建模

1、任务与数据集

本文采用的任务是文本分类任务中的情感分类，即给出一个句子，判断出它所表达的情感是积极的(用1表示)还是消极的(用0表示)。这里所使用的数据集是斯坦福大学所发布的一个情感分析数据集SST，其组成成分来自于电影的评论。而SST2则是二分类的任务。

在开始之前，我们需要先安装transformers，直接在pip上安装即可：

pip install transformers

然后加载我们需要用到的一些库：

#part 1 - bert feature base
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
import torch
import transformers as tfs
import warnings

warnings.filterwarnings('ignore')

2、加载数据集

train_df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)

train_set = train_df[:3000]   #取其中的3000条数据作为我们的数据集
print("Train set shape:", train_set.shape)
train_set[1].value_counts()   #查看数据集中标签的分布

得到以下输出：

Train set shape: (3000, 2)
1    1565
0    1435
Name: 1, dtype: int64

可以看出，积极和消极的标签基本对半分。

3、利用BERT进行特征抽取

在这里，我们利用BERT对数据集进行特征抽取，即把输入数据经过BERT模型，来获取输入数据的特征，这些特征包含了整个句子的信息，是语境层面的。这种做法类似于EMLo的特征抽取。需要注意的是，这里并没有使用到BERT的微调，因为BERT并不参与后面的训练，仅仅进行特征抽取操作。

model_class, tokenizer_class, pretrained_weights = (tfs.BertModel, tfs.BertTokenizer, 'bert-base-uncased')
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)

我们使用预训练好的"bert-base-uncased"模型参数进行处理，采用的模型是BertModel，采用的分词器是BertTokenizer。由于我们的输入句子是英文句子，所以需要先分词；然后把单词映射成词汇表的索引，再喂给模型。实际上Bert的分词操作，不是以传统的单词为单位的，而是以wordpiece为单位，这是比单词更细粒度的单位。我们执行以下代码：

#add_special_tokens 表示在句子的首尾添加[CLS]和[END]符号
train_tokenized = train_set[0].apply((lambda x: tokenizer.encode(x, add_special_tokens=True)))

然后，为了提升训练速度，我们需要把句子都处理成同一个长度，即常见的pad操作，我们在短的句子末尾添加一系列的[PAD]符号：

train_max_len = 0
for i in train_tokenized.values:
    if len(i) > train_max_len:
        train_max_len = len(i)

train_padded = np.array([i + [0] * (train_max_len-len(i)) for i in train_tokenized.values])
print("train set shape:",train_padded.shape)

#output：train set shape: (3000, 66)

最后，我们还需要让模型知道，哪些词是不用处理的，即上面我们添加的[PAD]符号：

train_attention_mask = np.where(train_padded != 0, 1, 0)

经过上面一系列步骤的处理，此时输入数据已经可以正确被Bert模型接收并处理了，我们直接进行特征的输出：

train_input_ids = torch.tensor(train_padded).long()
train_attention_mask = torch.tensor(train_attention_mask).long()
with torch.no_grad():
    train_last_hidden_states = model(train_input_ids, attention_mask=train_attention_mask)

我们来看以下Bert模型给我们的输出是什么样的：

train_last_hidden_states[0].size()

output: torch.Size([3000, 66, 768])

第一维的是样本数量，第二维的是序列长度，第三维是特征数量。也就是说，Bert对于我们的每一个位置的输入，都会输出一个对应的特征向量。

4、切分数据成训练集和测试集

train_features = train_last_hidden_states[0][:,0,:].numpy()
train_labels = train_set[1]

请注意：我们使用[:,0,:]来提取序列第一个位置的输出向量，因为第一个位置是[CLS]，比起其他位置，该向量应该更具有代表性，蕴含了整个句子的信息。紧接着，我们利用sklearn库的方法来把数据集切分成训练集和测试集。

train_features, test_features, train_labels, test_labels = train_test_split(train_features, train_labels)

5、使用逻辑回归进行训练

在这一部分，我们使用sklearn的逻辑回归模块对我们的训练集进行拟合，最后在测试集上进行评价：

lr_clf = LogisticRegression()
lr_clf.fit(train_features, train_labels)
lr_clf.score(test_features, test_labels)

输出：

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

accuracy: 0.8306666666666667

经过逻辑回归模型的拟合，其准确率达到了79.21，分类效果还不错。那么，我们还能进一步提升吗？

Part 2: 利用BERT基于微调的方式进行建模

在上一部分，我们利用了Bert抽取特征的能力进行建模，提取了Bert的输出特征，再输入给一个线性层以预测。但Bert本身的不参与模型的训练。现在我们采取另一种方式，即fine-tuned，Bert与线性层一起参与训练，反向传播会更新二者的参数，使得Bert模型更加适合这个分类任务。那么，让我们开始吧~

1、建立模型

#part 2 - bert fine-tuned
import torch
from torch import nn
from torch import optim
import transformers as tfs
import math

class BertClassificationModel(nn.Module):
    def __init__(self):
        super(BertClassificationModel, self).__init__()   
        model_class, tokenizer_class, pretrained_weights = (tfs.BertModel, tfs.BertTokenizer, 'bert-base-uncased')         
        self.tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
        self.bert = model_class.from_pretrained(pretrained_weights)
        self.dense = nn.Linear(768, 2)  #bert默认的隐藏单元数是768， 输出单元是2，表示二分类
        
    def forward(self, batch_sentences):
        batch_tokenized = self.tokenizer.batch_encode_plus(batch_sentences, add_special_tokens=True,
                                max_len=66, pad_to_max_length=True)      #tokenize、add special token、pad
        input_ids = torch.tensor(batch_tokenized['input_ids'])
        attention_mask = torch.tensor(batch_tokenized['attention_mask'])
        bert_output = self.bert(input_ids, attention_mask=attention_mask)
        bert_cls_hidden_state = bert_output[0][:,0,:]       #提取[CLS]对应的隐藏状态
        linear_output = self.dense(bert_cls_hidden_state)
        return linear_output

模型很简单，关键代码都在上面注释了。其主要构成是在bert模型的[CLS]输出位置接上一个线性层，用以预测句子的分类。

2、数据分批

下面我们对原来的数据集进行一些改造，分成batch_size为64大小的数据集，以便模型进行批量梯度下降。

sentences = train_set[0].values
targets = train_set[1].values
train_inputs, test_inputs, train_targets, test_targets = train_test_split(sentences, targets)

batch_size = 64
batch_count = int(len(train_inputs) / batch_size)
batch_train_inputs, batch_train_targets = [], []
for i in range(batch_count):
    batch_train_inputs.append(train_inputs[i*batch_size : (i+1)*batch_size])
    batch_train_targets.append(train_targets[i*batch_size : (i+1)*batch_size])

3、训练模型

#train the model
epochs = 3
lr = 0.01
print_every_batch = 5
bert_classifier_model = BertClassificationModel()
optimizer = optim.SGD(bert_classifier_model.parameters(), lr=lr, momentum=0.9)
criterion = nn.CrossEntropyLoss()

for epoch in range(epochs):
    print_avg_loss = 0
    for i in range(batch_count):
        inputs = batch_train_inputs[i]
        labels = torch.tensor(batch_train_targets[i])
        optimizer.zero_grad()
        outputs = bert_classifier_model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        print_avg_loss += loss.item()
        if i % print_every_batch == (print_every_batch-1):
            print("Batch: %d, Loss: %.4f" % ((i+1), print_avg_loss/print_every_batch))
            print_avg_loss = 0

得到以下输出：

Batch: 5, Loss: 0.6938
Batch: 10, Loss: 0.6647
Batch: 15, Loss: 0.6175
Batch: 20, Loss: 0.5445
Batch: 25, Loss: 0.7380
Batch: 30, Loss: 0.4852
Batch: 35, Loss: 0.4842
Batch: 5, Loss: 0.4027
Batch: 10, Loss: 0.2978
Batch: 15, Loss: 0.3876
Batch: 20, Loss: 0.5566
Batch: 25, Loss: 0.3102
Batch: 30, Loss: 0.2467
Batch: 35, Loss: 0.2219

4、模型评价

# eval the trained model
total = len(test_inputs)
hit = 0
with torch.no_grad():
    for i in range(total):
        outputs = bert_classifier_model([test_inputs[i]])
        _, predicted = torch.max(outputs, 1)
        if predicted == test_targets[i]:
            hit += 1

print("Accuracy: %.2f%%" % (hit / total * 100))

这里我们用测试数据集对已经训练好的模型进行评价，并打印其准确率，输出如下：

Accuracy: 90.53%

可以看出，通过微调的方式来建模，经过3个轮次的训练后，模型的准确率达到了90.53%，比起基于特征的建模方式有了较大提升。下面给出本文代码的地址，有需要的可以自取~谢谢您的阅读！

项目地址：
基于Bert的文本分类实例

参考文章
Using BERT for the first time
Transformers官方文档

基于Transformers库的BERT模型：一个文本情感分类的实例解析

简介

Part 1: 利用BERT基于特征的方式进行建模

Part 2: 利用BERT基于微调的方式进行建模

vue项目获取富文本编辑器wangEditor内容导出为word（html转word格式并下载）

dotnet C# 创建 X11 应用时设置窗口背景颜色

Navicat安装与激活教程

TDengine docker安装方法

vue3组件通信与props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的发布时间

工作中用到的脚本合集

合并代码时Beyond Compare设置

一個基於PyTorch實現的Glove詞向量的實例

Picasso源碼完全解析——學習其優秀設計思想

Android源碼探究：Activity啓動流程完全解析

基於Transformers庫的BERT模型：一個文本情感分類的實例解析

Android View 深度分析requestLayout、invalidate與postInvalidate

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結