xlnet模型微調---英文文本分類

每隔一段時間在NLP領域都有個新聞,xx模型全面超越bert,今天也不例外,今天刷屏的是xlnet網絡,在bert的基礎上做了修改,模型網絡是24層,模型大小是中文的bert的4倍左右,看下怎麼玩,其中英文分詞這裏原代碼中用的是 sentencepiece,所以在使用時要安裝這個包,下面一起來看看怎麼使用:首先是要下載模型用於

curl -O "https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip"

下載模型之後大概在1.2g左右,其中包括模型的配置xlnet_config.json、英文分詞模型spiece.model以及xlnet的checkpoint文件,三個和中文的bert模型類似,然後下載英文用於文本分類的語料:

 

curl -O "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"

解壓裏面有train文件下面含有neg和pos兩個文件用於在英文文本二分類,然後下載xlnet源碼:

 git clone https://github.com/zihangdai/xlnet.git

下載源碼在需要對run_classifier.py做一些配置,命令行如下:

python run_classifier.py \
  --use_tpu=False \
  --tpu="" \
  --do_train=True \
  --do_eval=False \
  --eval_all_ckpt=False \
  --task_name=imdb \
  --data_dir=/Users/shuubiasahi/Downloads/aclImdb \
  --output_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb \
  --model_dir=/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb \
  --uncased=False \
  --spiece_model_file=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model \
  --model_config_path=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
  --init_checkpoint=/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
  --max_seq_length=512 \
  --train_batch_size=32 \
  --eval_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=8 \
  --learning_rate=2e-5 \
  --train_steps=4000 \
  --warmup_steps=500 \
  --save_steps=500 \
  --iterations=500

 

 

直接在ide裏面如下:


# Model
flags.DEFINE_string("model_config_path", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json",
      help="Model config path.")
flags.DEFINE_float("dropout", default=0.1,
      help="Dropout rate.")
flags.DEFINE_float("dropatt", default=0.1,
      help="Attention dropout rate.")
flags.DEFINE_integer("clamp_len", default=-1,
      help="Clamp length")
flags.DEFINE_string("summary_type", default="last",
      help="Method used to summarize a sequence into a compact vector.")
flags.DEFINE_bool("use_summ_proj", default=True,
      help="Whether to use projection for summarizing sequences.")
flags.DEFINE_bool("use_bfloat16", False,
      help="Whether to use bfloat16.")

# Parameter initialization
flags.DEFINE_enum("init", default="normal",
      enum_values=["normal", "uniform"],
      help="Initialization method.")
flags.DEFINE_float("init_std", default=0.02,
      help="Initialization std when init is normal.")
flags.DEFINE_float("init_range", default=0.1,
      help="Initialization std when init is uniform.")

# I/O paths
flags.DEFINE_bool("overwrite_data", default=False,
      help="If False, will use cached data if available.")
flags.DEFINE_string("init_checkpoint", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt",
      help="checkpoint path for initializing the model. "
      "Could be a pretrained model or a finetuned model.")
flags.DEFINE_string("output_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/proc_data/imdb",
      help="Output dir for TF records.")
flags.DEFINE_string("spiece_model_file", default="/Users/shuubiasahi/Downloads/xlnet_cased_L-24_H-1024_A-16/spiece.model",
      help="Sentence Piece model path.")
flags.DEFINE_string("model_dir", default="/Users/shuubiasahi/Documents/python/xlnettextclass/exp/imdb",
      help="Directory for saving the finetuned model.")
flags.DEFINE_string("data_dir", default="/Users/shuubiasahi/Downloads/aclImdb",
      help="Directory for input data.")

# TPUs and machines
flags.DEFINE_bool("use_tpu", default=False, help="whether to use TPU.")
flags.DEFINE_integer("num_hosts", default=1, help="How many TPU hosts.")
flags.DEFINE_integer("num_core_per_host", default=8,
      help="8 for TPU v2 and v3-8, 16 for larger TPU v3 pod. In the context "
      "of GPU training, it refers to the number of GPUs used.")
flags.DEFINE_string("tpu_job_name", default=None, help="TPU worker job name.")
flags.DEFINE_string("tpu", default=None, help="TPU name.")
flags.DEFINE_string("tpu_zone", default=None, help="TPU zone.")
flags.DEFINE_string("gcp_project", default=None, help="gcp project.")
flags.DEFINE_string("master", default=None, help="master")
flags.DEFINE_integer("iterations", default=1000,
      help="number of iterations per TPU training loop.")

# training
flags.DEFINE_bool("do_train", default=True, help="whether to do training")
flags.DEFINE_integer("train_steps", default=10000,
      help="Number of training steps")
flags.DEFINE_integer("warmup_steps", default=0, help="number of warmup steps")
flags.DEFINE_float("learning_rate", default=1e-5, help="initial learning rate")
flags.DEFINE_float("lr_layer_decay_rate", 1.0,
                   "Top layer: lr[L] = FLAGS.learning_rate."
                   "Low layer: lr[l-1] = lr[l] * lr_layer_decay_rate.")
flags.DEFINE_float("min_lr_ratio", default=0.0,
      help="min lr ratio for cos decay.")
flags.DEFINE_float("clip", default=1.0, help="Gradient clipping")
flags.DEFINE_integer("max_save", default=0,
      help="Max number of checkpoints to save. Use 0 to save all.")
flags.DEFINE_integer("save_steps", default=100,
      help="Save the model for every save_steps. "
      "If None, not to save any model.")
flags.DEFINE_integer("train_batch_size", default=8,
      help="Batch size for training")
flags.DEFINE_float("weight_decay", default=0.00, help="Weight decay rate")
flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon")
flags.DEFINE_string("decay_method", default="poly", help="poly or cos")

# evaluation
flags.DEFINE_bool("do_eval", default=False, help="whether to do eval")
flags.DEFINE_bool("do_predict", default=False, help="whether to do prediction")
flags.DEFINE_float("predict_threshold", default=0,
      help="Threshold for binary prediction.")
flags.DEFINE_string("eval_split", default="dev", help="could be dev or test")
flags.DEFINE_integer("eval_batch_size", default=128,
      help="batch size for evaluation")
flags.DEFINE_integer("predict_batch_size", default=128,
      help="batch size for prediction.")
flags.DEFINE_string("predict_dir", default=None,
      help="Dir for saving prediction files.")
flags.DEFINE_bool("eval_all_ckpt", default=False,
      help="Eval all ckpts. If False, only evaluate the last one.")
flags.DEFINE_string("predict_ckpt", default=None,
      help="Ckpt path for do_predict. If None, use the last one.")

# task specific
flags.DEFINE_string("task_name", default="imdb", help="Task name")
flags.DEFINE_integer("max_seq_length", default=128, help="Max sequence length")
flags.DEFINE_integer("shuffle_buffer", default=2048,
      help="Buffer size used for shuffle.")
flags.DEFINE_integer("num_passes", default=1,
      help="Num passes for processing training data. "
      "This is use to batch data without loss for TPUs.")
flags.DEFINE_bool("uncased", default=False,
      help="Use uncased.")
flags.DEFINE_string("cls_scope", default=None,
      help="Classifier layer scope.")
flags.DEFINE_bool("is_regression", default=False,
      help="Whether it's a regression task.")

FLAGS = flags.FLAGS

要配置的主要是模型位置,配置文件位置,自己重寫寫簡單,和bert大同小異,還有一些可能錯誤的結果把代碼中的

run_config=None就可以,畢竟沒有TPU

運行如下:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章