BART原理簡介與代碼實戰

寫在前面

最近huggingface的transformer庫,增加了BART模型,Bart是該庫中最早的Seq2Seq模型之一,在文本生成任務,例如摘要抽取方面達到了SOTA的結果。
在這裏插入圖片描述
本次放出了三組不同的預訓練權重:

  • bart-large:基礎預訓練模型;
  • bart-large-cnn:基礎模型在 CNN/Daily Mail Abstractive Summarization Task微調後的模型;
  • bart-large-mnli:基礎模型在MNLI classification task微調後的模型;

下面我們來看看BART。

背景:Seq2Seq預訓練

去年10月,來自Google和Facebook的團隊分別發佈了新的Transformer-related論文:T5和BART。 這兩篇論文在如抽象總結和對話等生成任務上都取得了更好的下游性能,主要有兩個改變:

  • 在BERT的雙向編碼器架構中添加因果解碼器;
  • 用更復雜的預訓練任務代替BERT的完形填空任務。

現在讓我們更深入地研究Seq2Seq預訓練思想!

Bert vs. GPT2

正如BART作者在論文中寫的,

(BART) can be seen as generalizing Bert (due to the bidirectional encoder) and GPT2 (with the left to right decoder).

BERT

BERT最重要的預訓練任務是預測masked token,並使用整個輸入來獲取更完全的信息以進行更準確的預測。這對於那些允許利用位置ii之後的信息來預測位置ii 的任務是有效的,但是對於諸如文本生成之類的任務則沒有多大用處,這些對位置i的預測只能取決於先前生成的單詞。

在BERT源碼中,在預測位置ii時可以使用哪些信息是由由一個稱爲attention_mask的參數來控制的, 注意掩碼中的值爲1表示模型在預測行的單詞時可以利用的列單詞的信息。

下圖是BERT的"Fully-visible" 注意力矩陣,
在這裏插入圖片描述
關於BERT更爲詳細的講解可以參考往期文章:

GPT

GPT預訓練任務使用的是autoregressive的思想,使用已經解碼出的信息來預測下一個位置。該種模式對於生成任務更爲有效,而對於那些可以使用全局輸入來得到輸出的下游任務則比較差勁。

同樣的,給出GPT的注意力矩陣,
在這裏插入圖片描述
在這裏,當我們預測eating時,可以使用的信息只有<BOS> I love

Encoder-Decoder

我們的新朋友,例如BART,可以做到兩全其美。

其中Encoder的注意力矩陣是Fully-visible的,
在這裏插入圖片描述
而Decoder的注意力矩陣是autoregressive,
在這裏插入圖片描述
編碼器和解碼器通過cross attention連接,其中每個解碼器層都對編碼器輸出的最終隱藏狀態進行attention操作,這會使得模型生成與原始輸入緊密相關的輸出。

預訓練模式

Bart和T5在預訓練時都將文本span用掩碼替換, 然後讓模型學着去重建原始文檔。(PS.這裏進行了簡化, 這兩篇論文都對許多不同的預訓練任務進行了實驗,發現這一方法表現良好。T5使用replace corrupted spans任務, 沒有進行mask操作,而是選擇了隨機token進行替換。)

BART論文的圖很好地說明了這一點:
在這裏插入圖片描述
在上述示例中,原始文檔爲A B C D E。在編碼之前將文本[C,D]屏蔽掉,又在B之前插入一個額外的掩碼,然後將損壞的文檔A _ B _ E作爲編碼器的輸入。解碼器必須使用編碼器的輸出和先前未損壞的標記來重建原始文檔。

Summarization

摘要生成任務中,輸入序列是我們要總結的文檔,輸出序列是一段事實摘要。 Seq2Seq架構可直接用於摘要任務,而無需任何新的操作, 並且預訓練任務也非常適合下游任務。 下表中的數字證實了這一點:在CNN / Daily Mail抽象摘要任務中,所有新的Seq2Seq模型都比那些old less-fancy模型做得好得多,而BART的表現尤其出色。
在這裏插入圖片描述

  • BertSumABS來自論文《Text Summarization with Pretrained Encoders》,使用Seq2Seq結構,但沒有對解碼器進行預訓練。 TransformerAbs來自同一篇論文,使用稍小的模型,並且也沒有預訓練。
  • PT-Gen來自論文《 Get To The Point: Summarization with Pointer-Generator Networks》
  • UniLM是一種“ Prefix-LM”,具有與Bart和T5相似的masking策略

Demo: BartForConditionalGeneration

這一節來看看如何用幾行代碼就完成一個摘要抽取任務。

Step 1

首先安裝最新版本的transformers和一些必要的庫

!pip install transformers --upgrade

在這裏插入圖片描述

import torch
try:
    import transformers
    from transformers import BartTokenizer, BartForConditionalGeneration
except ImportError:
    raise ImportError(INSTALL_MSG)
from IPython.display import display, Markdown

Step 2

找到一篇待抽取的長長長文本

torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'

LONG_BORING_TENNIS_ARTICLE = """
 Andy Murray  came close to giving himself some extra preparation time for his w
edding next week before ensuring that he still has unfinished tennis business to
 attend to. The world No 4 is into the semi-finals of the Miami Open, but not be
fore getting a scare from 21 year-old Austrian Dominic Thiem, who pushed him to 
4-4 in the second set before going down 3-6 6-4, 6-1 in an hour and three quarte
rs. Murray was awaiting the winner from the last eight match between Tomas Berdy
ch and Argentina's Juan Monaco. Prior to this tournament Thiem lost in the secon
d round of a Challenger event to soon-to-be new Brit Aljaz Bedene. Andy Murray p
umps his first after defeating Dominic Thiem to reach the Miami Open semi finals
 . Muray throws his sweatband into the crowd after completing a 3-6, 6-4, 6-1 vi
ctory in Florida . Murray shakes hands with Thiem who he described as a 'strong 
guy' after the game . And Murray has a fairly simple message for any of his fell
ow British tennis players who might be agitated about his imminent arrival into 
the home ranks: don't complain. Instead the British No 1 believes his colleagues
 should use the assimilation of the world number 83, originally from Slovenia, a
s motivation to better themselves. At present any grumbles are happening in priv
ate, and Bedene's present ineligibility for the Davis Cup team has made it less 
of an issue, although that could change if his appeal to play is allowed by the 
International Tennis Federation. Murray thinks anyone questioning the move, now 
it has become official, would be better working on getting their ranking closer 
to his. 'If he was 500 in the world they wouldn't be that fussed about it but ob
viously he threatens their position a bit,' said the 27 year-old Scot. ' and he'
s obviously the British number two, comfortably. 'So they can complain but the b
est thing to do is use it in the right way and accept it for what it is, and try
 to use it as motivation whether they agree with it or not. He's British now so 
they've just got to deal with it. Murray stretches for a return after starting h
is quarter final match slowly on the show court . Thiem held nothing back as he 
raced through the opening set, winning it 6-3 with a single break . The young Au
strian is considered to be one of the hottest prospects on the ATP Tour . 'I wou
ld hope that all the guys who are below him now like James (Ward) , Kyle (Edmund
) , Liam (Broady) they will use it as motivation. If he becomes eligible for Dav
is Cup then those guys are going to have to prove themselves. 'It can only be se
en as a positive for those guys using it to try to get better. He's a good playe
r but so are James and Kyle and Liam has improved. Aljaz is there, he's on the t
our every week, the other guys aren't quite there yet.' For the first time Murra
y, who has an encyclopaedic knowledge of the top 100, gave his opinion of Bedene
: 'He's a good player with a very good serve. He's a legitimate top 100 player, 
when he plays Challengers he's there or thereabouts, when he plays on the main t
our he wins matches, it's not like he turns up and always loses in the first rou
nd. Murray's fiancee was once again watching from the stands shaded by a huge br
immed hat . Kim Sears flashes her enormous diamond engagement ring while watchin
g her beau on court . 'He had a bad injury last year (wrist) but has recovered w
ell. I would imagine he would keep moving up the rankings although I don't know 
exactly how high he can go. I've practised with him a couple of times, I haven't
 seen him play loads, but when you serve as well as he does it helps. I would im
agine he' s going to be comfortably in the top 70 or 80 in the world for a while
.' It is understood the Lawn Tennis Association will give background support to 
his case regarding the Davis Cup but have made it clear that the onus is on him 
to lead the way. An official statement said: 'To have another player in the men'
s top 100 is clearly a positive thing for British tennis and so we very much wel
come Aljaz's change in citizenship.' The last comparable switch came twenty year
s ago when Greg Rusedski arrived from Canada. It was by no means universally pop
ular but, like Bedene, he pledged that he was in for the long haul and, in fairn
ess to him, he proved true to his word. Loising the first set shocked Murray int
o life as he raced to a commanding lead in the second . The No 3 seed sent over 
a few glaring looks towards his team before winning the second set . Murray had 
to put such matters aside as he tackled the unusually talented Thiem, a delight 
to watch. Coached by Boris Becker's veteran mentor Gunter Bresnik, he slightly r
esembles Andy Roddick and hits with similar power but more elegance. His single 
handed backhand is a thing of rare beauty. However, he has had a mediocre season
 coming into this event and there was little to forewarn of his glorious shotmak
ing that seemed to catch Murray unawares early on. The world No 4 looked to have
 worked him out in the second, but then suffered one of his periopdic mental lap
ses and let him back in from 4-1 before closing it out with a break. After break
ing him for 3-1 in the decider the Austrian whirlwind burnt itself out. 'He's a 
strong guy who hits the ball hard and it became a very physical match,' said Mur
ray. Murray was presented with a celebratory cake after winning his 500th match 
in the previous round .
""".replace('\n','')

Step 3

只需幾行代碼調用transformers庫中的BART,進行摘要生成
在這裏插入圖片描述

BART生成的摘要效果:
Andy Murray beat Dominic Thiem 3-6, 6-4, 6-1 in the Miami Open. The world No 4 is into the semi-finals of the tournament in Florida. Murray was awaiting the winner from the last eight match between Tomas Berdych and Argentina’s Juan Monaco. Thiem lost in the second round of a Challenger event to Aljaz Bedene.

再來看看GPT的效果,公平的說,它並不適合摘要生成的任務。
在這裏插入圖片描述

GPT生成的效果:
'To have a player like James Ward, Kyle Edmund, Liam Broady and Aljaz Bedene in the top 100 is a huge achievement for the Lawn Tennis Association. The Lawn Tennis Association is committed to the development of the sport and the development of the sport’s players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport’s players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport’s players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport’s players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport’s players. The Lawn Tennis Association is committed to the development of the sport and the development of the

最後,大家可以在共享的colab裏看到上述demo:https://colab.research.google.com/drive/11hKBPfsfBXPKo-dK_gHsPklF4PcNflQZ
或者直接去transformers的官方github庫裏查看。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章