據說ERNIE 在中文 NLP 任務中比Bert更爲優秀,看論文感覺是在bert基礎上做了一些訓練的技巧.
https://github.com/nghuyong/ERNIE-Pytorch 轉化模型代碼項目
測試代碼附上可直接執行的:
#!/usr/bin/env python
# encoding: utf-8
import torch
from pytorch_transformers import BertTokenizer, BertModel,BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained('./ERNIE-converted')
input_tx = "[CLS] [MASK] [MASK] [MASK] 是中國神魔小說的經典之作,與《三國演義》《水滸傳》《紅樓夢》並稱爲中國古典四大名著。[SEP]"
tokenized_text = tokenizer.tokenize(input_tx)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([[0]*47])
model = BertForMaskedLM.from_pretrained('./ERNIE-converted')
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
predictions = outputs[0]
predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0,46)]
predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in range(1,46)]
print('Predicted token is:',predicted_token)