1. 安裝依賴的leptonica庫
建議使用 su root
切換到root用戶下安裝,避免編譯過程中的權限不足問題
wget http://www.leptonica.org/source/leptonica-1.78.0.tar.gz
tar -xzvf leptonica-1.78.0.tar.gz
cd leptonica-1.78.0
./configure
make && make install
2. 安裝Tesseract-OCR
同樣建議使用 root 用戶編譯
wget https://codeload.github.com/tesseract-ocr/tesseract/tar.gz/4.1.0
tar -xvf 4.1.0
cd tesseract-4.1.0/
./autogen.sh
./configure
make && make install
sudo ldconfig
安裝過程比較簡單,根據機器配置與網絡情況,可能需要30-60分鐘
3. 可能的報錯
- 執行 ./autogen.sh 報錯
./autogen.sh:行59: bail_out: 未找到命令
./autogen.sh:行82: aclocal: 未找到命令
解決方案
yum install automake -y
yum install libtool -y
- tesseract make 時報錯
libtool: Version mismatch error. This is libtool 2.4.6, but the
libtool: definition of this LT_INIT comes from libtool 2.4.2.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.6
libtool: and run autoconf again.
解決方案
執行 autoreconf -ivf 命令
- 安裝完成後執行命令報錯
$ tesseract 13.jpg result -l chi_sim
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
解決方案:
1. 下載預訓練文件
2. 將訓練文件放至 /usr/local/share/tessdata 目錄
下載地址:https://github.com/tesseract-ocr/tessdata
chi_sim.traineddata 中文
eng.traineddata 英文
enm.traineddata 數字