python 調用 java 的 ansj_seg 分詞工具

解決方案鏈接: https://github.com/NLPchina/ansj_seg/issues/681

提供給對於 java不熟, 不想用jiaba分詞, 對 ansj_seg 念念不忘的同學們一個 python 一個解決方案:
環境: python2.7 jdk1.8.0_161 tree_split-1.5.jar, nlp-lang-1.7.7.jar和 ansj_seg-5.1.6.jar
對於環境, 雖然文檔寫的jdk1.6, 但是可能是之前的版本, 看最新的jar包的mainfest文件中有jdk的最新版本.
踩了很多坑, 剛開始一直找不到類, 弄了大半天, 各種試jdk版本.
第一注意jdk版本:
第二注意所有依賴包, 看源碼發現會引用nlp-lang裏面的類
第三注意函數是調用父類 Analysis的方法名稱.

#!coding=utf-8
import jpype
import os
jvmPath = '/usr/lib/java/jdk1.8.0_161/jre/lib/amd64/server/libjvm.so'
print jvmPath

jars_dir = '/mnt/data/pretrained_models/word2vec_models/jars4ansj'
jars = [os.path.join(jars_dir, 'ansj_seg-5.1.6.jar'), os.path.join(jars_dir, 'nlp-lang-1.7.7.jar'), os.path.join(jars_dir, 'tree_split-1.5.jar')]
jvm_cp = "-Djava.class.path={}".format(':'.join(jars))
jpype.startJVM(jvmPath, "-ea", jvm_cp)
SegModel = jpype.JClass('org.ansj.splitWord.analysis.ToAnalysis')
jd = SegModel()
print(jd.parseStr("怎麼這麼麻煩"))

jpype.shutdownJVM()

返回信息:
/usr/lib/java/jdk1.8.0_161/jre/lib/amd64/server/libjvm.so
Sep 11, 2018 11:12:25 PM org.ansj.util.MyStaticValue warn
WARNING: not find library.properties in classpath use it by default !
Sep 11, 2018 11:12:25 PM org.ansj.dic.impl.File2Stream info
INFO: path to stream library/ambiguity.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.AmbiguityLibrary error
SEVERE: Init ambiguity library error :org.ansj.exception.LibraryException: path :library/ambiguity.dic file:/home/jinmming/git_manager/paraphrase/bimpm/test_units/library/ambiguity.dic not found or can not to read, path: library/ambiguity.dic
Sep 11, 2018 11:12:25 PM org.ansj.dic.impl.File2Stream info
INFO: path to stream library/default.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.DicLibrary error
SEVERE: Init dic library error :org.ansj.exception.LibraryException: path :library/default.dic file:/home/jinmming/git_manager/paraphrase/bimpm/test_units/library/default.dic not found or can not to read, path: library/default.dic
Sep 11, 2018 11:12:25 PM org.ansj.library.DATDictionary info
INFO: init core library ok use time : 572
Sep 11, 2018 11:12:25 PM org.ansj.library.NgramLibrary info
INFO: init ngram ok use time :287
怎麼/r,這麼/r,麻煩/an
JVM has been shutdown

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章