mac 安裝tesseract、pytesseract，實現圖片裏文字的識別

一， tesseract-OCR的介紹

1，tesseract-OCR是一個開源的OCR引擎，能識別100多種語言，專門用於對圖片文字進行識別，並獲取文本。但是它的缺點是對手寫的識別能力比較差。
2，用tesseract可以識別的圖片中字體，主要有以下一些特點:

使用一個標準字體
可以使用複印或者拍照，但是必須字體要清晰，沒有痕跡
圖片裏沒有歪歪斜斜的字體
另外沒有超出圖片中的字體，也沒有殘缺的字體

二， mac tesseract-OCR的安裝

1，安裝有四種方式：

brew install --with-training-tools tesseract //安裝tesseract，同時安裝訓練工具
brew install --all-languages tesseract //安裝tesseract，同時它還會安裝所有語言
brew install --all-languages --with-training-tools tesseract //安裝附加組件
brew install tesseract //安裝tesseract，但是不安裝訓練工具，我選擇這種方式進行安裝

2，安裝完tesseract後，進行測試:

tesseract -v
tesseract的安裝路徑爲：/usr/local/Cellar/tesseract/4.0.0/

3，tesseract命令基本用法

tesseract 9.jpg result //result是輸出文件

4，下載語言庫
這裏可以根據自己的需求來下載所需要的語言庫，例如chi_sim.traineddata爲簡體中文：
下載地址：https://github.com/tesseract-ocr/tessdata
將chi_sim.traineddata下載後，需要將它放在/usr/local/Cellar/tesseract/4.0.0/share/tessdata目錄下。

三， mac pytesseract的安裝

1， python有着更加方便的方式調用tesseract，首先需要安裝pytesseract模塊
2，下載的命令

pip install pytesseract
pytesseract安裝路徑：/usr/local/lib/python2.7/site-packages/pytesseract

3，pytesseract模塊要與PIL一起使用

4，實例1：

from PIL import Image
import pytesseract

if __name__ == '__main__':
    text = pytesseract.image_to_string(Image.open('9.jpg'), lang='chi_sim')
    print(text)

運行結果：

mac 安裝tesseract、pytesseract，實現圖片裏文字的識別

一， tesseract-OCR的介紹

二， mac tesseract-OCR的安裝

三， mac pytesseract的安裝

DAPPER 事務 TRANSACTION

Java排序算法---->歸併排序算法

Leetcode 147 Insertion Sort List(插入排序列表）

完全二叉樹輸出最後一個節點

Java中，組合和繼承的區別

java， Statement類的介紹

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

mac 安裝tesseract、pytesseract， 實現圖片裏文字的識別

一， tesseract-OCR的介紹

二， mac tesseract-OCR的安裝

三， mac pytesseract的安裝

mac 安裝tesseract、pytesseract，實現圖片裏文字的識別