Java 實現OCR 識別圖像文字(手寫中文)----tess4j

原創

2018-10-27 00:12

最近有個需要,小程序端手寫中文之後生成圖像,後端需識別圖片上中文..;剛開始想到第三方收費api試試,先用了百度AI開放平臺的通用字體識別的API,但對於手寫字的識別不太高,通用字體還挺好的;於是找到了Tessearct-OCR,參考了幾篇文摘整合了一下

準備:

1.下載Tessearct-COR 3.0以上版本:https://download.csdn.net/download/qq_26161693/10646074

2. 在安裝時選擇chi_sim.traineddata 語言庫;之後在程序中需加載安裝目錄tessdata下的中文包(chi_sim.traineddata );

maven依賴:

<dependency>
       <groupId>net.sourceforge.tess4j</groupId>
       <artifactId>tess4j</artifactId>
       <version>3.2.1</version>
       </dependency>

Demo:

   /**
   *
   * @param srImage 圖片路徑
   * @param ZH_CN 是否使用中文訓練庫,true-是
   * @return 識別結果
   */
   public static String discernWord(String imagePath) {
       try {
           File image = new File(imagePath);
           BufferedImage textImage = ImageIO.read(image);
           Tesseract instance = Tesseract.getInstance();
           instance.setDatapath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");// 設置語言庫
instance.setLanguage("chi_sim");// 中文識別
           String words = null;
           words = instance.doOCR(textImage);
           return words;
       }
       catch (Exception e) {
           e.printStackTrace();
       }
   }

Test:

public static void main(String[] args) throws Exception {
       String words = discernWord("F:/test_used_url/ocr/originalPic/hotkidclub.jpg", true);//需識別的圖文件路徑
       System.out.println(words);
   }

ps:

在window開發環境下先安裝了tesseract親測可行,不過沒試過不安裝exe只加載語言包的情況;

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Java 實現OCR 識別圖像文字(手寫中文)----tess4j

準備:

maven依賴:

Demo:

Test:

ps:

在window開發環境下先安裝了tesseract親測可行,不過沒試過不安裝exe只加載語言包的情況;

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

Flink執行圖

Java響應式編程

評估統計算法在銀行僞造鈔票檢測中的價值

Dokcer部署Kafka集羣

【Linux命令學習】lsof查看打開的文件

SSM整合quartz框架-動態設置定時任務的實現

Mysql 查詢優化,排序,特殊使用等-年度總結

SSM整合 redis 實現緩存管理

springMVC ：HandlerMethodArgumentResolver+ 自定義註解,自定義解析器實現請求數據綁定方法入參

mysql支持emoji字符

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結