pyinstaller打包機器學習庫若干坑

參考文檔Recipe Multiprocessing

背景

之前調研的pyinstaller打包bin的方案進入落地階段,之前調研文章見利用pyinstaller打包python項目發佈到線上。之前實驗的對象是個很簡單的web服務,沒有過多的依賴其他包,這次落地的項目裏面使用了很多的機器學習庫,所以落地過程中還是稍顯麻煩。

問題

  • pyd文件引入問題
  • .so文件引入問題
  • multiprocessing和pyinstaller衝突問題

下面一一來說

pyd文件引入問題

pipenv run pyinstaller -F main.py -n scscore

打包成功後,生成了一個spec文件,執行程序,報錯

[doctorq@gz-inf-development01 scscore]$ ./dist/scscore
/tmp/_MEINtWbir/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    from src.route import load_route
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/route.py", line 6, in <module>
    from src.view.forecast_view.feature_importance_view import FeatureImportanceView
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/view/forecast_view/feature_importance_view.py", line 7, in <module>
    from src.importance.feature_importance import FeatureImportance
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/importance/feature_importance.py", line 7, in <module>
    from src.forecasting.trainer import Trainer
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/forecasting/trainer.py", line 13, in <module>
    from src.Models.collect_models import ModelCollector
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/collect_models.py", line 7, in <module>
    from src.Models.statistic_model.ARIMA import ARIMA
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/statistic_model/ARIMA.py", line 2, in <module>
    from pmdarima.arima import auto_arima
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/__init__.py", line 29, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/__init__.py", line 6, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/arima.py", line 10, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/__init__.py", line 36, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/__init__.py", line 20, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/unsupervised.py", line 16, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/pairwise.py", line 32, in <module>
  File "sklearn/metrics/pairwise_fast.pyx", line 1, in init sklearn.metrics.pairwise_fast
ModuleNotFoundError: No module named 'sklearn.utils._cython_blas'
[6625] Failed to execute script main

這些文件是c/c++編譯成的python庫,供python調用,需要額外處理,處理邏輯就是把這些庫按個加到scscore.spec文件中的hiddenimports屬性中,我是把各個庫下面的裏的cpython關鍵字的文件都加上了

[doctorq@gz-inf-development01 utils]$ ll|grep cpython
-rwxrwxr-x 1 doctorq doctorq 221256 7月  16 15:41 arrayfuncs.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 426280 7月  16 15:41 _cython_blas.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 238344 7月  16 15:41 fast_dict.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  95824 7月  16 15:41 graph_shortest_path.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  28512 7月  16 15:41 lgamma.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 179592 7月  16 15:41 _logistic_sigmoid.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  88856 7月  16 15:41 murmurhash.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  99864 7月  16 15:41 _random.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 140992 7月  16 15:41 seq_dataset.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 643648 7月  16 15:41 sparsefuncs_fast.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  62880 7月  16 15:41 weight_vector.cpython-37m-x86_64-linux-gnu.so

添加後的spec文件如下:

                         hiddenimports=['cython','sklearn','sklearn.utils._cython_blas','statsmodels','statsmodels.tsa'
                         'statsmodels.tsa.statespace._kalman_smoother',
                         'statsmodels.tsa.statespace._representation',
                         'statsmodels.tsa.statespace._simulation_smoother',
                         'statsmodels.tsa.statespace._statespace',
                         'statsmodels.tsa.statespace._tools',
                         'statsmodels.tsa.statespace._filters._conventional',
                         'statsmodels.tsa.statespace._filters._inversions',
                         'statsmodels.tsa.statespace._filters._univariate',
                         'statsmodels.tsa.statespace._smoothers._alternative',
                         'statsmodels.tsa.statespace._smoothers._classical',
                         'statsmodels.tsa.statespace._smoothers._conventional',
                         'statsmodels.tsa.statespace._smoothers._univariate',
             			 'sklearn.neighbors.typedefs',
           				 'sklearn.neighbors.quad_tree',
             			 'sklearn.neighbors.ball_tree',
             			 'sklearn.neighbors.dist_metrics',
           				 'sklearn.neighbors.kd_tree',
           				 'sklearn.tree._utils',
            			 'sklearn.tree._criterion',
             			 'sklearn.tree._splitter',
            			 'sklearn.tree._utils',

然後我們再編譯,所依賴的這種類型的庫,都集成進去了。

> pipenv run pyinstaller scscore.spec # 從spec文件安裝
> dist/scscore

  File "site-packages/xgboost/__init__.py", line 11, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/xgboost/core.py", line 161, in <module>
  File "site-packages/xgboost/core.py", line 123, in _load_lib
  File "site-packages/xgboost/libpath.py", line 48, in find_lib_path
xgboost.libpath.XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
List of candidates:
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/../../lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/./lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
[32970] Failed to execute script main

.so文件引入問題

上面的報錯主要都是xgboost的動態連接庫的問題,該問題解決方法就是在${pipenv --venv}/lib/python3.7/site-packages/PyInstaller/hooks下新增一個文件hook-xgboost.py,文件名嚴格要求,文件內容如下:

from PyInstaller.utils.hooks import collect_all

datas, binaries, hiddenimports = collect_all("xgboost")

然後再運行打包

> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

然後執行會出現如下情況,一直在啓動,不能停~~

在這裏插入圖片描述

出現這個問題是因爲joblib庫的一個bug,見文章Pyinstaller exe keeps opening itself,只需要把joblib降級到0.11就行了。

> pipenv install joblib==0.11
> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

搞定

通過以下配置將程序臨時文件存到其他地方,防止打爆/tmp文件

runtime_tmpdir='/home/doctorq/python-dev/scscore/tmp',
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章