從零到實踐《知乎"看山杯"第一名 init 隊解決方案(PyTorch)》

        首先我是一名JAVA開發者,對Python瞭解較少,最近工作需要對大量文本進行分析整理,然後就開始從網上找資料,從知乎渠道瞭解到知乎舉辦的看山杯比賽,找到了冠軍init隊的解決方案,便開始了嘗試。我的思路可能是錯誤的。

事實證明:機器學習需要帶GPU的大內存linux系統,虛擬機安裝的系統無法計算。

        首先linux系統需要64位,我使用虛擬機安裝了linux系統。

        虛擬機版本:VMware-workstation-full-14.1.1-7528167.exe,14pro版本

        linux系統版本:CentOS-7-x86_64-DVD-1708.iso  4G左右 系統是從阿里雲鏡像站下載的

        寫這個博客的目的是記錄我操作過程中步驟及問題解決辦法。因爲是一遍操作一遍記錄,所以篇幅可能沒有排版,後面做完之後會進行整理排版,另外本人也是一個相對完美主義者。

        1,安裝完系統後,內存設置爲2G,硬盤:40G,因爲電腦配置低,沒有滿足init解決方案的最低配置,但抱着嘗試的態度去嘗試。現在無法知道最後是否能完成,虛擬機默認爲NET網卡,開啓系統後,在虛擬機中操作ifconfig,(如果安裝的是簡單系統版本是沒有這個命令的),沒有ip地址。也無法ping通baidu.com。首先要可以與主機網絡互通,所以執行以下操作:

[root@localhost ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33 

打開後編輯

ONBOOT爲yes   ---剛開始爲no
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=763f71d6-ec83-4bee-8322-4c903f6b78ed
DEVICE=ens33
ONBOOT=yes

        編輯完成後,保存退出,重啓網絡或者重啓系統,這裏由於固定IP操作比較麻煩,所以未做這一步。

        重啓完成後,我使用XSHELL進行連接操作。該軟件視圖清晰,而且容易複製。在虛擬機中操作不可以複製粘貼命令,不太方便。連接之後,執行命令。表示可以聯網了。

[root@localhost ~]# ping baidu.com
PING baidu.com (123.125.115.110) 56(84) bytes of data.
64 bytes from 123.125.115.110 (123.125.115.110): icmp_seq=1 ttl=128 time=130 ms   

        接下來可能要安裝軟件,所以看下yum是否可以操作,如下命令表示可以使用yum

[root@localhost ~]# yum install unzip
已加載插件:fastestmirror
base   

        系統自帶Python,因爲方案是用的Python2.7,所以無需再重新安裝

[root@localhost ~]# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

        安裝pip和wheel,setuptools

[root@localhost /]# mkdir weblogic
[root@localhost /]# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  weblogic
[root@localhost /]# cd weblogic/
[root@localhost weblogic]# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1603k  100 1603k    0     0   465k      0  0:00:03  0:00:03 --:--:--  465k
[root@localhost weblogic]# python get-pip.py
Collecting pip
  Downloading https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 294kB/s 
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/7f/e1/820d941153923aac1d49d7fc37e17b6e73bfbd2904959fffbad77900cf92/setuptools-39.2.0-py2.py3-none-any.whl (567kB)
    100% |████████████████████████████████| 573kB 406kB/s 
Collecting wheel
  Downloading https://files.pythonhosted.org/packages/81/30/e935244ca6165187ae8be876b6316ae201b71485538ffac1d718843025a9/wheel-0.31.1-py2.py3-none-any.whl (41kB)
    100% |████████████████████████████████| 51kB 729kB/s 
Installing collected packages: pip, setuptools, wheel
Successfully installed pip-10.0.1 setuptools-39.2.0 wheel-0.31.1
[root@localhost weblogic]# 
[root@localhost weblogic]# ls
get-pip.py  ipdb-0.11.tar.gz  pip-10.0.1.tar.gz  setuptools-39.2.0  setuptools-39.2.0.zip  torch-0.1.12.post2-cp27-none-linux_x86_64.whl  wheel-0.31.1  wheel-0.31.1.tar.gz
[root@localhost weblogic]# cd wheel-0.31.1
[root@localhost wheel-0.31.1]# python setup.py install

安裝(PyTorch)

[root@localhost weblogic]# pip install torch-0.1.12.post2-cp27-none-linux_x86_64.whl 
Processing ./torch-0.1.12.post2-cp27-none-linux_x86_64.whl

安裝GIT 

[root@localhost PyTorchText-master]# yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel gcc perl-ExtUtils-MakeMaker
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile

 下載git安裝包

    wget https://www.kernel.org/pub/software/scm/git/git-2.8.3.tar.gz

  解壓git安裝包

    tar -zxvf git-2.8.3.tar.gz

    cd git-2.8.3

[root@localhost git-2.8.3]# pwd
/weblogic/git-2.8.3
[root@localhost git-2.8.3]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright © 2015 Free Software Foundation, Inc.
本程序是自由軟件;請參看源代碼的版權聲明。本軟件沒有任何擔保;
包括沒有適銷性和某一專用目的下的適用性擔保。
[root@localhost git-2.8.3]# ./configure prefix=/usr/local/git/
configure: Setting lib to 'lib' (the default)
./check_bindir "z$bindir" "z$execdir" "$bindir/git-add"
[root@localhost git-2.8.3]# git --version
git version 1.8.3.1
[root@localhost git-2.8.3]# 
[root@localhost PyTorchText-master]# pip install Cython
Collecting Cython
  Downloading https://files.pythonhosted.org/packages/f6/23/ef5521e077e9e7ef8e4603e27713ae95fee69e9c19c7cd036b4299c7ced5/Cython-0.28.3-cp27-cp27mu-manylinux1_x86_64.whl (3.3MB)
    100% |████████████████████████████████| 3.3MB 486kB/s 
Installing collected packages: Cython
Successfully installed Cython-0.28.3
[root@localhost PyTorchText-master]# 

安裝fasttext時,如果用pip會報錯, 

ImportError: No module named Cython.Build

解決方案如下:

pip install Cython

pip install fasttext   ---這個安裝報錯了。信息如下

[root@localhost PyTorchText-master]# pip install fasttext
Collecting fasttext
  Downloading https://files.pythonhosted.org/packages/a4/86/ff826211bc9e28d4c371668b30b4b2c38a09127e5e73017b1c0cd52f9dfa/fasttext-0.8.3.tar.gz (73kB)
    100% |████████████████████████████████| 81kB 315kB/s 
Collecting numpy>=1 (from fasttext)
  Downloading https://files.pythonhosted.org/packages/c0/e7/08f059a00367fd613e4f2875a16c70b6237268a1d6d166c6d36acada8301/numpy-1.14.3-cp27-cp27mu-manylinux1_x86_64.whl (12.1MB)
    100% |████████████████████████████████| 12.1MB 392kB/s 
Collecting future (from fasttext)
  Downloading https://files.pythonhosted.org/packages/00/2b/8d082ddfed935f3608cc61140df6dcbf0edea1bc3ab52fb6c29ae3e81e85/future-0.16.0.tar.gz (824kB)
    100% |████████████████████████████████| 829kB 441kB/s 
Building wheels for collected packages: fasttext, future
  Running setup.py bdist_wheel for fasttext ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-KodiTL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
  running build_ext
  building 'fasttext.fasttext' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/fasttext
  creating build/temp.linux-x86_64-2.7/fasttext/cpp
  creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x  gcc: error trying to exec 'cc1plus': execvp: 沒有那個文件或目錄
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for fasttext
  Running setup.py clean for fasttext
  Running setup.py bdist_wheel for future ... done
  Stored in directory: /root/.cache/pip/wheels/bf/c9/a3/c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a
Successfully built future
Failed to build fasttext
Installing collected packages: numpy, future, fasttext
  Running setup.py install for fasttext ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__
);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
    running build_ext
    building 'fasttext.fasttext' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/fasttext
    creating build/temp.linux-x86_64-2.7/fasttext/cpp
    creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x    gcc: error trying to exec 'cc1plus': execvp: 沒有那個文件或目錄
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace(
'\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-DZjW32/fasttext/
安裝TensorFlow

pip install -r requirements.txt

[root@localhost PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/tnt.git@master (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-E_05vl
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/word2vec/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-0AZ5iL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/__init__.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/_version.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/io.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/scripts_interface.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/utils.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordclusters.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordvectors.py -> build/lib.linux-x86_64-2.7/word2vec
  creating build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/__init__.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/test_word2vec.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  UPDATING build/lib.linux-x86_64-2.7/word2vec/_version.py
  set build/lib.linux-x86_64-2.7/word2vec/_version.py to '0.9.2'
  running build_ext
  building 'word2vec.word2vec_noop' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/word2vec
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c word2vec/word2vec_noop.c -o build/temp.linux-x86_64-2.7/word2vec/word2vec_noop.o  word2vec/word2vec_noop.c:16:20: 致命錯誤:Python.h:沒有那個文件或目錄
   #include "Python.h"
                      ^
  編譯中斷。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for word2vec
  Running setup.py clean for word2vec
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-nmBFRj/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);c
ode=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-YBhBvd --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  copying scandir.py -> build/lib.linux-x86_64-2.7
  running build_ext
  building '_scandir' extension
  creating build/temp.linux-x86_64-2.7
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o  _scandir.c:14:20: 致命錯誤:Python.h:沒有那個文件或目錄
   #include <Python.h>
                      ^
  編譯中斷。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for scandir
  Running setup.py clean for scandir
Successfully built torchnet
Failed to build word2vec scandir
Installing collected packages: scandir, pathlib2, ipython-genutils, traitlets, simplegeneric, pickleshare, ipython, ipdb, fire, tqdm, torchfile, pyzmq, backports-abc, singledispat
ch, tornado, websocket-client, pillow, scipy, certifi, chardet, idna, urllib3, requests, visdom, word2vec, torchnet  Running setup.py install for scandir ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__)
;code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    copying scandir.py -> build/lib.linux-x86_64-2.7
    running build_ext
    building '_scandir' extension
    creating build/temp.linux-x86_64-2.7
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o    _scandir.c:14:20: 致命錯誤:Python.h:沒有那個文件或目錄
     #include <Python.h>
                        ^
    編譯中斷。
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-aaHbvs/scandir/[root@localhost PyTorchText-master]# 

查找問題 在Centos7上安裝Python-dev 

[root@localhost PyTorchText-master]# yum install python-dev
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
沒有可用軟件包 python-dev。
錯誤:無須任何處理
[root@localhost PyTorchText-master]# yum install Python-devel
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
沒有可用軟件包 Python-devel。
  * 也許您想要:python-devel
錯誤:無須任何處理
[root@localhost PyTorchText-master]# yum install python-devel
已加載插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
正在解決依賴關係
--> 正在檢查事務
---> 軟件包 python-devel.x86_64.0.2.7.5-68.el7 將被 安裝
--> 正在處理依賴關係 python(x86-64) = 2.7.5-68.el7,它被軟件包 python-devel-2.7.5-68.el7.x86_64 需要
--> 正在檢查事務
---> 軟件包 python.x86_64.0.2.7.5-58.el7 將被 升級
---> 軟件包 python.x86_64.0.2.7.5-68.el7 將被 更新
--> 正在處理依賴關係 python-libs(x86-64) = 2.7.5-68.el7,它被軟件包 python-2.7.5-68.el7.x86_64 需要
--> 正在檢查事務
---> 軟件包 python-libs.x86_64.0.2.7.5-58.el7 將被 升級
---> 軟件包 python-libs.x86_64.0.2.7.5-68.el7 將被 更新
--> 解決依賴關係完成

依賴關係解決

===================================================================================================================================================================================
 Package                                       架構                                    版本                                            源                                     大小
===================================================================================================================================================================================
正在安裝:
 python-devel                                  x86_64                                  2.7.5-68.el7                                    base                                  397 k
爲依賴而更新:
 python                                        x86_64                                  2.7.5-68.el7                                    base                                   93 k
 python-libs                                   x86_64                                  2.7.5-68.el7                                    base                                  5.6 M

事務概要
===================================================================================================================================================================================
安裝  1 軟件包
升級           ( 2 依賴軟件包)

總下載量:6.1 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/3): python-2.7.5-68.el7.x86_64.rpm                                                                                                                       |  93 kB  00:00:00     
(2/3): python-devel-2.7.5-68.el7.x86_64.rpm                                                                                                                 | 397 kB  00:00:03     
(3/3): python-libs-2.7.5-68.el7.x86_64.rpm                                                                                                                  | 5.6 MB  00:00:38     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
總計                                                                                                                                               160 kB/s | 6.1 MB  00:00:39     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在更新    : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  正在更新    : python-2.7.5-68.el7.x86_64                                                                                                                                     2/5 
  正在安裝    : python-devel-2.7.5-68.el7.x86_64                                                                                                                               3/5 
  清理        : python-2.7.5-58.el7.x86_64                                                                                                                                     4/5 
  清理        : python-libs-2.7.5-58.el7.x86_64                                                                                                                                5/5 
  驗證中      : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  驗證中      : python-devel-2.7.5-68.el7.x86_64                                                                                                                               2/5 
  驗證中      : python-2.7.5-68.el7.x86_64                                                                                                                                     3/5 
  驗證中      : python-libs-2.7.5-58.el7.x86_64                                                                                                                                4/5 
  驗證中      : python-2.7.5-58.el7.x86_64                                                                                                                                     5/5 

已安裝:
  python-devel.x86_64 0:2.7.5-68.el7                                                                                                                                               

作爲依賴被升級:
  python.x86_64 0:2.7.5-68.el7                                                          python-libs.x86_64 0:2.7.5-68.el7                                                         

完畢!
[root@localhost PyTorchText-master]# 

然後執行成功

[root@localhost PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/tnt.git@master (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-kyfk8D
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... done
  Stored in directory: /root/.cache/pip/wheels/89/a1/cb/417bcc7143a3e2befcc82da185ce8ad4a340eb82c0bf48969c
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-oQlzp4/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... done
  Stored in directory: /root/.cache/pip/wheels/4a/ca/d7/26c3620234732f2d5b3ca86d7ccb0f59a21bd7712bffbbedc2
Successfully built word2vec torchnet scandir
Installing collected packages: scandir, pathlib2, simplegeneric, pickleshare, ipython-genutils, traitlets, ipython, ipdb, fire, tqdm, torchfile, singledispatch, backports-abc, tor
nado, scipy, pillow, pyzmq, websocket-client, urllib3, idna, chardet, certifi, requests, visdom, word2vec, torchnetSuccessfully installed backports-abc-0.5 certifi-2018.4.16 chardet-3.0.4 fire-0.1.3 idna-2.6 ipdb-0.11 ipython-5.7.0 ipython-genutils-0.2.0 pathlib2-2.3.2 pickleshare-0.7.4 pillow
-5.1.0 pyzmq-17.0.0 requests-2.18.4 scandir-1.7 scipy-1.1.0 simplegeneric-0.8.1 singledispatch-3.4.0.3 torchfile-0.1.0 torchnet-0.0.2 tornado-5.0.2 tqdm-4.23.4 traitlets-4.3.2 urllib3-1.22 visdom-0.1.8.3 websocket-client-0.48.0 word2vec-0.9.2[root@localhost PyTorchText-master]# 

安裝完上述依賴之後,啓動可視化工具visdom 服務
```sh
python -m visdom.server
```

pytorch學習筆記(八):PytTorch可視化工具 visdom

至此,環境已經準備好了,接下來就要準備init的源碼和數據文件了

[root@localhost PyTorchText-master]# ll *.txt
-rw-r--r--. 1 root root   29200241 6月   5 16:55 char_embedding.txt
-rw-r--r--. 1 root root  239862273 6月   5 16:53 question_eval_set.txt
-rw-r--r--. 1 root root  204459814 6月   5 16:52 question_topic_train_set.txt
-rw-r--r--. 1 root root 3317236306 6月   5 16:57 question_train_set.txt
-rw-r--r--. 1 root root         77 6月   5 11:45 requirements.txt
-rw-r--r--. 1 root root    1072551 6月   5 16:53 topic_info.txt
-rw-r--r--. 1 root root 1005008916 6月   5 16:55 word_embedding.txt
[root@localhost PyTorchText-master]# 


## 2. 數據預處理

###  2.1 詞向量轉成numpy數組


[root@localhost PyTorchText-master]# python scripts/data_process/embedding2matrix.py main char_embedding.txt char_embedding.npz 
[root@localhost PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.txt
[root@localhost PyTorchText-master]# python scripts/data_process/embedding2matrix.py main word_embedding.txt word_embedding.npz 
[root@localhost PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py           word_embedding.txt
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.npz
### 2.2  問題轉成numpy 數組

這一步很耗內存,請確保內存>32G,僅操作了小文件

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
Traceback (most recent call last):
  File "scripts/data_process/question2array.py", line 85, in <module>
    fire.Fire()
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "scripts/data_process/question2array.py", line 19, in main
    char2id = np.load('/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz')['word2id'].item()
  File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 372, in load
    fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: '/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz'
[root@localhost PyTorchText-master]# 

報錯,需要修改文件中的路徑,

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
217360it [00:34, 6317.30it/s]
a
b
c
d
[root@localhost PyTorchText-master]# 

### 2.3 處理label,轉成json

[root@localhost PyTorchText-master]# python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(17)main()
     16     import ipdb;ipdb.set_trace()
---> 17     all_labels = { _ for ii,jj in results for _ in jj }
     18     sorted_labels = sorted(all_labels)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(18)main()
     17     all_labels = { _ for ii,jj in results for _ in jj }
---> 18     sorted_labels = sorted(all_labels)
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(19)main()
     18     sorted_labels = sorted(all_labels)
---> 19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
     20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(20)main()
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
---> 20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}
     21 

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(22)main()
     21 
---> 22     d = {ii:[label2id[jj] for jj in labels ]  for ii,labels in results}
     23 

ipdb> n
n> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(24)main()
     23 
---> 24     data = dict(d=d,label2id=label2id,id2label=id2label)
     25     import json

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(25)main()
     24     data = dict(d=d,label2id=label2id,id2label=id2label)
---> 25     import json
     26     with open(outfile,'w') as f:

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(26)main()
     25     import json
---> 26     with open(outfile,'w') as f:
     27         json.dump(data,f)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> n




--Return--
None
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> 
> /usr/lib/python2.7/site-packages/fire/core.py(543)_CallCallable()
    542   result = fn(*varargs, **kwargs)
--> 543   return result, consumed_args, remaining_args, capacity
    544 

ipdb> c
[1]+  已殺死               python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
[root@localhost PyTorchText-master]# 

操作文檔中說很耗內存的一步,也操作完成了,我的內存是2G。未找到train.npz,可能是因爲內存原因失敗了。

[root@localhost PyTorchText-master]# python scripts/data_process/question2array.py main question_train_set.txt train.npz
已殺死
[root@localhost PyTorchText-master]# 

接下來從訓練集中抽取一部分的數據生成驗證集, 這部分代碼是從ipython中備份的,__注意修改代碼中的數據存放路徑__ .

[root@localhost PyTorchText-master]# python scripts/data_process/get_val.py 
[root@localhost PyTorchText-master]# 

## 3. 訓練模型

我發現了致命的錯誤

[root@localhost PyTorchText-master]#  python main.py main --max_epoch=5 --plot_every=100 --env='MultiCNNText' --weight=1 --model='MultiCNNTextBNDeep'  --batch-size=64  --lr=0.001 
--lr2=0.000 --lr_decay=0.8 --decay_every=10000  --title-dim=250 --content-dim=250    --weight-decay=0 --type_='word' --debug-file='/tmp/debug'  --linear-hidden-size=2000 --zhuge=True  --augument=FalseTraceback (most recent call last):
  File "main.py", line 158, in <module>
    fire.Fire()  
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 74, in main
    model = getattr(models,opt.model)(opt).cuda()
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 272, in __new__
    _lazy_init()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
[root@localhost PyTorchText-master]#




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章