基於python的OCR中文字符識別——基於windows平臺

原創

2020-02-25 06:19

1.安裝配套環境
（1）首先安裝OCR字符識別庫Tesseract 下載網址：https://digi.bib.uni-mannheim.de/tesseract/

下載下圖對應的版本

下載後雙擊進行安裝，這裏因爲我們要識別中文字符，所以在安裝界面中需要進行額外的語言勾選，展開Additional language data

然後點擊next安裝即可（注意：在選擇安裝路徑的時候不要出現中文，並且要記住這個安裝路徑）

接下來配置環境變量.路徑添加到環境變量中

分別對用戶變量PATH和系統變量Path添加剛纔的安裝目錄 D:\toolplace\OCR\Tesseract-OCR; 這裏注意各個變量之間隔開用英文的分號。

環境變量修改好之後驗證下是否安裝成功。打開cmd命令行工具敲入命令：

Tesseract -v

安裝python環境

pip install Pillow==5.2.0
pip install pytesseract==0.2.4

pathSaveShot = “”

img = Image.open(pathSaveShot)
text = pytesseract.image_to_string(img, lang='chi_sim')
logging.info('[截取圖片的識別結果:' + text + ']')

問題：

安裝之後報錯

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

報錯原因很明確: 沒有找到 tesseract

解決方案:

1.找到python的安裝路徑下的pytesseract: 例如我的是 E:\Python3.7.1\Lib\site-packages\pytesseract

2.用文本編輯器打開，查找tesseract_cmd

將原來的 tesseract_cmd = 'tesseract' 改爲: tesseract_cmd = 'OCR的安裝路徑下的tessract.exe'

例如我的是 tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'

注意有的地方需要轉義例如 \\tesseract.exe，或者也可直接加r轉義

tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

報錯問題2：

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR\\/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

解決方法：
1.要設置環境變量 TESSDATA_PREFIX，它的值爲tessdata目錄
系統默認tessdata目錄：C:\Program Files (x86)\Tesseract-OCR\tessdata
2.設置完再次運行如果仍然報相同的錯誤，重啓一下電腦即可。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

基於python的OCR中文字符識別——基於windows平臺

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

C語言--右移左移

12款高效開源Wiki系統推薦，打造團隊知識管理利器

一個開源且全面的C#算法實戰教程

dotnet 基於 DirectML 控制檯運行 Phi-3 模型

自定義MyBatis插件

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

常用的 Git 指令

sm4加密工具類

Python Windows下新建虛擬環境

python 調用webserver接口

基於python的OCR中文字符識別——基於windows平臺

Run自動打開軟件時需用管理員方式打開解決方法

默認以管理員運行所有程序

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結