nltk安裝第三方自然語言處理工具



nltk安裝第三方自然語言處理工具:

https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software


How NLTK Discovers Third Party Software

NLTK finds third party software through environment variables or via path arguments through api calls. This page will list installation instructions & their associated environment variables.

Java

Java is not required by nltk, however some third party software may be dependent on it. NLTK finds the java binary via the system PATH environment variable, or through JAVAHOME or JAVA_HOME.

To search for java binaries (jar files), nltk checks the java CLASSPATH variable, however there are usually independent environment variables which are also searched for each dependency individually.

Windows

Linux

It is best to use the package manager to install java.

Stanford Tagger, NER, Tokenizer and Parser.

To install:

  • Make sure java is installed (version 1.8+)

  • Download & extract the stanford tokenizer package (contains the stanford tagger): http://nlp.stanford.edu/software/lex-parser.shtml

  • Download & extract the stanford NER package http://nlp.stanford.edu/software/CRF-NER.shtml

  • Download & extract the stanford POS tagger package http://nlp.stanford.edu/software/tagger.shtml

  • Download & extract the stanford Parser package: http://nlp.stanford.edu/software/lex-parser.shtml

  • Add the directories containing stanford-postagger.jar, stanford-ner.jar and stanford-parser.jar to the CLASSPATH environment variable

  • Point the STANFORD_MODELS environment variable to the directory containing the stanford tokenizer models, stanford pos models, stanford ner models, stanford parser models e.g (arabic.tagger, arabic-train.tagger, chinese-distsim.tagger,stanford-parser-x.x.x-models.jar ...)

  • e.g. export STANFORD_MODELS=/usr/share/stanford-postagger-full-2015-01-30/models:/usr/share/stanford-ner-2015-04-20/classifier

發佈了15 篇原創文章 · 獲贊 41 · 訪問量 13萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章