PyText入門（環境搭建和demo實現）

原創

2018-12-24 01:52

簡介

12月15日，facebook宣佈開源PyText NLP框架。 PyText是一種基於深度學習的NLP建模框架，基於PyTorch 1.0構建。它可以連接 ONNX 和 Caffe2，藉助 PyText，AI 研究人員和工程師可以把 PyTorch 模型轉化爲 ONNX，然後將其導出爲 Caffe2，用於大規模生產部署，讓模型的建立，更新，發佈更加便捷。

項目地址：https://github.com/facebookresearch/pytext

環境搭建

平臺：linux centos7
IDE：anaconda
python：3.6（注意一定要是3.6版本以上，否則bug大人會來找你）

step1.創建虛擬環境
用anaconda可以很方便的進行python的版本管理和包的管理，anaconda下載地址：
https://www.anaconda.com/download/
創建一個python 3.6版本，名稱爲pytext的虛擬環境：

conda create -n pytext python=3.6

step2.下載相關的包

pip install pytext-nlp

不過值得注意的是，通過這個方式下載的是CPU版，如果需要安裝GPU版，需要自己手動去安裝PyTorch的GPU版，步驟如下：

export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda的根目錄]
#安裝基本的依賴包
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
#爲GPU添加LAPACK支持
conda install -c pytorch magma-cuda92 # or [magma-cuda80 | magma-cuda91] 取決於你的cuda版本，一定要在7.5以上

運行PyText

step1.將git上的PyText項目克隆到本機：

git clone https://github.com/facebookresearch/pytext.git

step2.PyText的用法

pytext [OPTIONS] COMMAND [ARGS]...

[OPTIONS]：
–config-file TEXT
–config-json TEXT
–help

Commands:
export 將pytext模型快照轉換爲caffe2模型.
gen-default-config 根據默認參數生成一個json格式的配置文件
help-config 打印配置文件參數的幫助信息
predict 啓動caffe2模型進行預測
predict-py 啓動PyTorch進行預測
test 測試訓練模型快照
train 訓練模型並保存最好的快照

step3.demo實現
數據集是PyText提供的demo數據，做一個分類模型的訓練，根據輸入的命令文本來判斷命令屬於哪一個類。這個數據集十分小，因此不要對其準確率抱有期待了，後面的預測環節，更是不要有所期待。熟悉用法之後，還是拿自己的數據來訓練吧。

訓練集如下：

alarm/modify_alarm      16:24:datetime,39:57:datetime   change my alarm tomorrow to wake me up 30 minutes earlier
alarm/set_alarm         Turn on all my alarms

訓練配置

{
  "task": {
    "DocClassificationTask": {
      "data_handler": {
        "train_path": "tests/data/train_data_tiny.tsv",
        "eval_path": "tests/data/test_data_tiny.tsv",
        "test_path": "tests/data/test_data_tiny.tsv"
      }
    }
  }
}

訓練（10 epoch，默認將結果保存在/tmp/test_out.txt）

 pytext train < demo/configs/docnn.json

測試

 pytext test < demo/configs/docnn.json

模型快照的導出（默認是在/tmp/model.caffe2.predictor）

pytext export --output-path exported_model.c2 （這裏是指定的路徑）< demo/configs/docnn.json

預測
最簡單的方法是使用命令行：

 pytext --config-file demo/configs/docnn.json predict <<< '{"raw_text": "create an alarm for 1:30 pm"}'

但是如果你保存在了自己路徑，不是/tmp/model.caffe2.predictor的話，就會find不到模型啦，因此，還可以自己寫一個腳本運行

import pytext
config_file = 'demo/configs/docnn.json'  #配置文件路徑
model_file = 'exported_model_demo.c2'  #之前export模型的路徑

config = pytext.load_config(config_file)
predictor = pytext.create_predictor(config,model_file)

result = predictor({"raw_text":"create an alarm for 1:30 pm"})
print(result)

PyTorch和TensorFlow的戰爭，會不會因爲PyText的出現有所改變，拭目以待。

參考資料：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

PyText入門（環境搭建和demo實現）

簡介

環境搭建

運行PyText

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

python用三種方式實現生產消費模型（進程，線程，協程）

使用python將doc文件轉爲utf8編碼格式的txt

docker命令速查

sentencePiece入門小結

豆瓣最受歡迎的影評爬蟲（第一個爬蟲撒花！）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結