每隔一段時間在NLP領域都有個新聞,xx模型全面超越bert,今天也不例外,今天刷屏的是xlnet網絡,在bert的基礎上做了修改,模型網絡是24層,模型大小是中文的bert的4倍左右,看下怎麼玩,其中英文分詞這裏原代碼中用的是 sentencepiece,所以在使用時要安裝這個包,下面一起來看看怎麼使用:首先是要下載模型用於
curl -O "https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip"
下載模型之後大概在1.2g左右,其中包括模型的配置xlnet_config.json、英文分詞模型spiece.model以及xlnet的checkpoint文件,三個和中文的bert模型類似,然後下載英文用於文本分類的語料:
curl -O "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
解壓裏面有train文件下面含有neg和pos兩個文件用於在英文文本二分類,然後下載xlnet源碼:
git clone https://github.com/zihangdai/xlnet.git
下載源碼在需要對run_classifier.py做一些配置,命令行如下:
python run_classifier.py \
--use_tpu=False \
--tpu="" \
--do_train=True \
--do_eval=False \
--eval_all_ckpt=False \
--task_name=imdb \
--data_dir=/Users/shuubiasahi/Downloads/aclImdb \
--output_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb \
--model_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb \
--uncased=False \
--spiece_model_file=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model \
--model_config_path=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
--init_checkpoint=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
--max_seq_length=512 \
--train_batch_size=32 \
--eval_batch_size=8 \
--num_hosts=1 \
--num_core_per_host=8 \
--learning_rate=2e-5 \
--train_steps=4000 \
--warmup_steps=500 \
--save_steps=500 \
--iterations=500
直接在ide裏面如下:
# Model
flags.DEFINE_string("model_config_path", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json",
help="Model config path.")
flags.DEFINE_float("dropout", default=0.1,
help="Dropout rate.")
flags.DEFINE_float("dropatt", default=0.1,
help="Attention dropout rate.")
flags.DEFINE_integer("clamp_len", default=-1,
help="Clamp length")
flags.DEFINE_string("summary_type", default="last",
help="Method used to summarize a sequence into a compact vector.")
flags.DEFINE_bool("use_summ_proj", default=True,
help="Whether to use projection for summarizing sequences.")
flags.DEFINE_bool("use_bfloat16", False,
help="Whether to use bfloat16.")
# Parameter initialization
flags.DEFINE_enum("init", default="normal",
enum_values=["normal", "uniform"],
help="Initialization method.")
flags.DEFINE_float("init_std", default=0.02,
help="Initialization std when init is normal.")
flags.DEFINE_float("init_range", default=0.1,
help="Initialization std when init is uniform.")
# I/O paths
flags.DEFINE_bool("overwrite_data", default=False,
help="If False, will use cached data if available.")
flags.DEFINE_string("init_checkpoint", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt",
help="checkpoint path for initializing the model. "
"Could be a pretrained model or a finetuned model.")
flags.DEFINE_string("output_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb",
help="Output dir for TF records.")
flags.DEFINE_string("spiece_model_file", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model",
help="Sentence Piece model path.")
flags.DEFINE_string("model_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb",
help="Directory for saving the finetuned model.")
flags.DEFINE_string("data_dir", default="/Users/shuubiasahi/Downloads/aclImdb",
help="Directory for input data.")
# TPUs and machines
flags.DEFINE_bool("use_tpu", default=False, help="whether to use TPU.")
flags.DEFINE_integer("num_hosts", default=1, help="How many TPU hosts.")
flags.DEFINE_integer("num_core_per_host", default=8,
help="8 for TPU v2 and v3-8, 16 for larger TPU v3 pod. In the context "
"of GPU training, it refers to the number of GPUs used.")
flags.DEFINE_string("tpu_job_name", default=None, help="TPU worker job name.")
flags.DEFINE_string("tpu", default=None, help="TPU name.")
flags.DEFINE_string("tpu_zone", default=None, help="TPU zone.")
flags.DEFINE_string("gcp_project", default=None, help="gcp project.")
flags.DEFINE_string("master", default=None, help="master")
flags.DEFINE_integer("iterations", default=1000,
help="number of iterations per TPU training loop.")
# training
flags.DEFINE_bool("do_train", default=True, help="whether to do training")
flags.DEFINE_integer("train_steps", default=10000,
help="Number of training steps")
flags.DEFINE_integer("warmup_steps", default=0, help="number of warmup steps")
flags.DEFINE_float("learning_rate", default=1e-5, help="initial learning rate")
flags.DEFINE_float("lr_layer_decay_rate", 1.0,
"Top layer: lr[L] = FLAGS.learning_rate."
"Low layer: lr[l-1] = lr[l] * lr_layer_decay_rate.")
flags.DEFINE_float("min_lr_ratio", default=0.0,
help="min lr ratio for cos decay.")
flags.DEFINE_float("clip", default=1.0, help="Gradient clipping")
flags.DEFINE_integer("max_save", default=0,
help="Max number of checkpoints to save. Use 0 to save all.")
flags.DEFINE_integer("save_steps", default=100,
help="Save the model for every save_steps. "
"If None, not to save any model.")
flags.DEFINE_integer("train_batch_size", default=8,
help="Batch size for training")
flags.DEFINE_float("weight_decay", default=0.00, help="Weight decay rate")
flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon")
flags.DEFINE_string("decay_method", default="poly", help="poly or cos")
# evaluation
flags.DEFINE_bool("do_eval", default=False, help="whether to do eval")
flags.DEFINE_bool("do_predict", default=False, help="whether to do prediction")
flags.DEFINE_float("predict_threshold", default=0,
help="Threshold for binary prediction.")
flags.DEFINE_string("eval_split", default="dev", help="could be dev or test")
flags.DEFINE_integer("eval_batch_size", default=128,
help="batch size for evaluation")
flags.DEFINE_integer("predict_batch_size", default=128,
help="batch size for prediction.")
flags.DEFINE_string("predict_dir", default=None,
help="Dir for saving prediction files.")
flags.DEFINE_bool("eval_all_ckpt", default=False,
help="Eval all ckpts. If False, only evaluate the last one.")
flags.DEFINE_string("predict_ckpt", default=None,
help="Ckpt path for do_predict. If None, use the last one.")
# task specific
flags.DEFINE_string("task_name", default="imdb", help="Task name")
flags.DEFINE_integer("max_seq_length", default=128, help="Max sequence length")
flags.DEFINE_integer("shuffle_buffer", default=2048,
help="Buffer size used for shuffle.")
flags.DEFINE_integer("num_passes", default=1,
help="Num passes for processing training data. "
"This is use to batch data without loss for TPUs.")
flags.DEFINE_bool("uncased", default=False,
help="Use uncased.")
flags.DEFINE_string("cls_scope", default=None,
help="Classifier layer scope.")
flags.DEFINE_bool("is_regression", default=False,
help="Whether it's a regression task.")
FLAGS = flags.FLAGS
要配置的主要是模型位置,配置文件位置,自己重寫寫簡單,和bert大同小異,還有一些可能錯誤的結果把代碼中的
run_config=None就可以,畢竟沒有TPU
運行如下: