StanfordCoreNLP 是基於java版的，python封裝也只是請求java接口，不是很方便。

這個效果可以使用官網測試地址：http://corenlp.run/

stanza是純Python版的coreNLP，更方便

1、安裝

pip install stanza

2、下載模型 stanza_resources

import stanza
stanza.download('en') # download English model
stanza.download('zh') # download chinese model

注意：在jupyter中下載如果有問題，可在終端中，python交互界面中下載，也可複製鏈接後使用下載工具下載，然後按照目錄結構解壓即可

目錄結構：

3、使用

import stanza
# 可寫配置文件，或單獨傳入
# lang 指定語言，
config = {
    'dir':'./stanza_resources/', # 如未使用 stanza.download() 下載模型；必須指定模型文件路徑
#     'processors': 'tokenize,mwt,pos,ner', # Comma-separated list of processors to use
    'lang': 'zh' #'en', # Language code for the language to build the Pipeline in
#     'tokenize_model_path': './fr_gsd_models/fr_gsd_tokenizer.pt', # Processor-specific arguments are set with keys "{processor_name}_{argument_name}"
#     'mwt_model_path': './fr_gsd_models/fr_gsd_mwt_expander.pt',
#     'pos_model_path': './fr_gsd_models/fr_gsd_tagger.pt',
#     'pos_pretrain_path': './fr_gsd_models/fr_gsd.pretrain.pt',
#     'tokenize_pretokenized': True # Use pretokenized text as input and disable tokenization
}
nlp = stanza.Pipeline(**config)
#輸出：
2020-04-15 16:58:35 INFO: Loading these models for language: en (English):
=========================
| Processor | Package   |
-------------------------
| tokenize  | ewt       |
| pos       | ewt       |
| lemma     | ewt       |
| depparse  | ewt       |
| ner       | ontonotes |
=========================

2020-04-15 16:58:35 INFO: Use device: gpu
2020-04-15 16:58:35 INFO: Loading: tokenize
2020-04-15 16:58:40 INFO: Loading: pos
2020-04-15 16:58:41 INFO: Loading: lemma
2020-04-15 16:58:41 INFO: Loading: depparse
2020-04-15 16:58:42 INFO: Loading: ner
2020-04-15 16:58:42 INFO: Done loading processors!

doc = nlp('快速的棕色狐狸跳過了懶惰的狗')

doc.sentences
# 輸出：
[[
   {
     "id": "1",
     "text": "快速",
     "lemma": "快速",
     "upos": "ADJ",
     "xpos": "JJ",
     "head": 4,
     "deprel": "amod",
     "misc": "start_char=0|end_char=2"
   },
   {
     "id": "2",
     "text": "的",
     "lemma": "的",
     "upos": "PART",
     "xpos": "DEC",
     "head": 1,
     "deprel": "mark:relcl",
     "misc": "start_char=2|end_char=3"
   },
   {
     "id": "3",
     "text": "棕色",
     "lemma": "棕色",
     "upos": "NOUN",
     "xpos": "NN",
     "head": 4,
     "deprel": "nmod",
     "misc": "start_char=3|end_char=5"
   },
   {
     "id": "4",
     "text": "狐狸",
     "lemma": "狐狸",
     "upos": "NOUN",
     "xpos": "NN",
     "head": 5,
     "deprel": "nsubj",
     "misc": "start_char=5|end_char=7"
   },
   {
     "id": "5",
     "text": "跳過",
     "lemma": "跳過",
     "upos": "VERB",
     "xpos": "VV",
     "head": 0,
     "deprel": "root",
     "misc": "start_char=7|end_char=9"
   },
   {
     "id": "6",
     "text": "了",
     "lemma": "了",
     "upos": "PART",
     "xpos": "AS",
     "feats": "Aspect=Perf",
     "head": 5,
     "deprel": "case:aspect",
     "misc": "start_char=9|end_char=10"
   },
   {
     "id": "7",
     "text": "懶惰",
     "lemma": "懶惰",
     "upos": "ADJ",
     "xpos": "JJ",
     "head": 9,
     "deprel": "amod",
     "misc": "start_char=10|end_char=12"
   },
   {
     "id": "8",
     "text": "的",
     "lemma": "的",
     "upos": "PART",
     "xpos": "DEC",
     "head": 7,
     "deprel": "mark:relcl",
     "misc": "start_char=12|end_char=13"
   },
   {
     "id": "9",
     "text": "狗",
     "lemma": "狗",
     "upos": "NOUN",
     "xpos": "NN",
     "head": 5,
     "deprel": "obj",
     "misc": "start_char=13|end_char=14"
   }
 ]]

doc.sentences[0].print_dependencies()
輸出：
('快速', '4', 'amod')
('的', '1', 'mark:relcl')
('棕色', '4', 'nmod')
('狐狸', '5', 'nsubj')
('跳過', '0', 'root')
('了', '5', 'case:aspect')
('懶惰', '9', 'amod')
('的', '7', 'mark:relcl')
('狗', '5', 'obj')

doc.sentences[0].print_tokens()
輸出：
<Token id=1;words=[<Word id=1;text=快速;lemma=快速;upos=ADJ;xpos=JJ;head=4;deprel=amod>]>
<Token id=2;words=[<Word id=2;text=的;lemma=的;upos=PART;xpos=DEC;head=1;deprel=mark:relcl>]>
<Token id=3;words=[<Word id=3;text=棕色;lemma=棕色;upos=NOUN;xpos=NN;head=4;deprel=nmod>]>
<Token id=4;words=[<Word id=4;text=狐狸;lemma=狐狸;upos=NOUN;xpos=NN;head=5;deprel=nsubj>]>
<Token id=5;words=[<Word id=5;text=跳過;lemma=跳過;upos=VERB;xpos=VV;head=0;deprel=root>]>
<Token id=6;words=[<Word id=6;text=了;lemma=了;upos=PART;xpos=AS;feats=Aspect=Perf;head=5;deprel=case:aspect>]>
<Token id=7;words=[<Word id=7;text=懶惰;lemma=懶惰;upos=ADJ;xpos=JJ;head=9;deprel=amod>]>
<Token id=8;words=[<Word id=8;text=的;lemma=的;upos=PART;xpos=DEC;head=7;deprel=mark:relcl>]>
<Token id=9;words=[<Word id=9;text=狗;lemma=狗;upos=NOUN;xpos=NN;head=5;deprel=obj>]>

doc.sentences[0].print_words()
輸出：
<Word id=1;text=快速;lemma=快速;upos=ADJ;xpos=JJ;head=4;deprel=amod>
<Word id=2;text=的;lemma=的;upos=PART;xpos=DEC;head=1;deprel=mark:relcl>
<Word id=3;text=棕色;lemma=棕色;upos=NOUN;xpos=NN;head=4;deprel=nmod>
<Word id=4;text=狐狸;lemma=狐狸;upos=NOUN;xpos=NN;head=5;deprel=nsubj>
<Word id=5;text=跳過;lemma=跳過;upos=VERB;xpos=VV;head=0;deprel=root>
<Word id=6;text=了;lemma=了;upos=PART;xpos=AS;feats=Aspect=Perf;head=5;deprel=case:aspect>
<Word id=7;text=懶惰;lemma=懶惰;upos=ADJ;xpos=JJ;head=9;deprel=amod>
<Word id=8;text=的;lemma=的;upos=PART;xpos=DEC;head=7;deprel=mark:relcl>
<Word id=9;text=狗;lemma=狗;upos=NOUN;xpos=NN;head=5;deprel=obj>

doc = nlp('新冠病毒在美國情況惡劣。')

doc.ents,doc.entities
輸出：
[{
   "text": "美國",
   "type": "GPE",
   "start_char": 5,
   "end_char": 7
 }]

以下爲標註解釋：來源網絡，侵權刪

詞性和實體標註解釋

https://www.cnblogs.com/gaofighting/p/9768023.html

句法關係標註解釋：

來源：https://blog.csdn.net/l919898756/article/details/81670228

ROOT：要處理文本的語句
IP：簡單從句
NP：名詞短語
VP：動詞短語
PU：斷句符，通常是句號、問號、感嘆號等標點符號
LCP：方位詞短語
PP：介詞短語
CP：由‘的’構成的表示修飾性關係的短語
DNP：由‘的’構成的表示所屬關係的短語
ADVP：副詞短語
ADJP：形容詞短語
DP：限定詞短語
QP：量詞短語
NN：常用名詞
NR：固有名詞
NT：時間名詞
PN：代詞
VV：動詞
VC：是
CC：表示連詞
VE：有
VA：表語形容詞
AS：內容標記（如：了）
VRD：動補複合詞
CD: 表示基數詞
DT: determiner 表示限定詞
EX: existential there 存在句
FW: foreign word 外來詞
IN: preposition or conjunction, subordinating 介詞或從屬連詞
JJ: adjective or numeral, ordinal 形容詞或序數詞
JJR: adjective, comparative 形容詞比較級
JJS: adjective, superlative 形容詞最高級
LS: list item marker 列表標識
MD: modal auxiliary 情態助動詞
PDT: pre-determiner 前位限定詞
POS: genitive marker 所有格標記
PRP: pronoun, personal 人稱代詞
RB: adverb 副詞
RBR: adverb, comparative 副詞比較級
RBS: adverb, superlative 副詞最高級
RP: particle 小品詞 
SYM: symbol 符號
TO:”to” as preposition or infinitive marker 作爲介詞或不定式標記 
WDT: WH-determiner WH限定詞
WP: WH-pronoun WH代詞
WP$: WH-pronoun, possessive WH所有格代詞
WRB:Wh-adverb WH副詞
 
關係表示
abbrev: abbreviation modifier，縮寫
acomp: adjectival complement，形容詞的補充；
advcl : adverbial clause modifier，狀語從句修飾詞
advmod: adverbial modifier狀語
agent: agent，代理，一般有by的時候會出現這個
amod: adjectival modifier形容詞
appos: appositional modifier,同位詞
attr: attributive，屬性
aux: auxiliary，非主要動詞和助詞，如BE,HAVE SHOULD/COULD等到
auxpass: passive auxiliary 被動詞
cc: coordination，並列關係，一般取第一個詞
ccomp: clausal complement從句補充
complm: complementizer，引導從句的詞好重聚中的主要動詞
conj : conjunct，連接兩個並列的詞。
cop: copula。系動詞（如be,seem,appear等），（命題主詞與謂詞間的）連繫
csubj : clausal subject，從主關係
csubjpass: clausal passive subject 主從被動關係
dep: dependent依賴關係
det: determiner決定詞，如冠詞等
dobj : direct object直接賓語
expl: expletive，主要是抓取there
infmod: infinitival modifier，動詞不定式
iobj : indirect object，非直接賓語，也就是所以的間接賓語；
mark: marker，主要出現在有“that” or “whether”“because”, “when”,
mwe: multi-word expression，多個詞的表示
neg: negation modifier否定詞
nn: noun compound modifier名詞組合形式
npadvmod: noun phrase as adverbial modifier名詞作狀語
nsubj : nominal subject，名詞主語
nsubjpass: passive nominal subject，被動的名詞主語
num: numeric modifier，數值修飾
number: element of compound number，組合數字
parataxis: parataxis: parataxis，並列關係
partmod: participial modifier動詞形式的修飾
pcomp: prepositional complement，介詞補充
pobj : object of a preposition，介詞的賓語
poss: possession modifier，所有形式，所有格，所屬
possessive: possessive modifier，這個表示所有者和那個’S的關係
preconj : preconjunct，常常是出現在 “either”, “both”, “neither”的情況下
predet: predeterminer，前綴決定，常常是表示所有
prep: prepositional modifier
prepc: prepositional clausal modifier
prt: phrasal verb particle，動詞短語
punct: punctuation，這個很少見，但是保留下來了，結果當中不會出現這個
purpcl : purpose clause modifier，目的從句
quantmod: quantifier phrase modifier，數量短語
rcmod: relative clause modifier相關關係
ref : referent，指示物，指代
rel : relative
root: root，最重要的詞，從它開始，根節點
tmod: temporal modifier
xcomp: open clausal complement
xsubj : controlling subject 掌控者
中心語爲謂詞
  subj — 主語
 nsubj — 名詞性主語（nominal subject） （同步，建設）
   top — 主題（topic） （是，建築）
npsubj — 被動型主語（nominal passive subject），專指由“被”引導的被動句中的主語，一般是謂詞語義上的受事 （稱作，鎳）
 csubj — 從句主語（clausal subject），中文不存在
 xsubj — x主語，一般是一個主語下面含多個從句 （完善，有些）
中心語爲謂詞或介詞   
   obj — 賓語
  dobj — 直接賓語 （頒佈，文件）
  iobj — 間接賓語（indirect object），基本不存在
 range — 間接賓語爲數量詞，又稱爲與格 （成交，元）
  pobj — 介詞賓語 （根據，要求）
  lobj — 時間介詞 （來，近年）
中心語爲謂詞
  comp — 補語
 ccomp — 從句補語，一般由兩個動詞構成，中心語引導後一個動詞所在的從句(IP) （出現，納入）
 xcomp — x從句補語（xclausal complement），不存在   
 acomp — 形容詞補語（adjectival complement）
 tcomp — 時間補語（temporal complement） （遇到，以前）
lccomp — 位置補語（localizer complement） （佔，以上）
       — 結果補語（resultative complement）
中心語爲名詞
   mod — 修飾語（modifier）
  pass — 被動修飾（passive）
  tmod — 時間修飾（temporal modifier）
 rcmod — 關係從句修飾（relative clause modifier） （問題，遇到）
 numod — 數量修飾（numeric modifier） （規定，若干）
ornmod — 序數修飾（numeric modifier）
   clf — 類別修飾（classifier modifier） （文件，件）
  nmod — 複合名詞修飾（noun compound modifier） （浦東，上海）
  amod — 形容詞修飾（adjetive modifier） （情況，新）
advmod — 副詞修飾（adverbial modifier） （做到，基本）
  vmod — 動詞修飾（verb modifier，participle modifier）
prnmod — 插入詞修飾（parenthetical modifier）
   neg — 不定修飾（negative modifier） (遇到，不)
   det — 限定詞修飾（determiner modifier） （活動，這些）
 possm — 所屬標記（possessive marker），NP
  poss — 所屬修飾（possessive modifier），NP
  dvpm — DVP標記（dvp marker），DVP （簡單，的）
dvpmod — DVP修飾（dvp modifier），DVP （採取，簡單）
  assm — 關聯標記（associative marker），DNP （開發，的）
assmod — 關聯修飾（associative modifier），NP|QP （教訓，特區）
  prep — 介詞修飾（prepositional modifier） NP|VP|IP（採取，對）
 clmod — 從句修飾（clause modifier） （因爲，開始）
 plmod — 介詞性地點修飾（prepositional localizer modifier） （在，上）
   asp — 時態標詞（aspect marker） （做到，了）
partmod– 分詞修飾（participial modifier） 不存在
   etc — 等關係（etc） （辦法，等）
中心語爲實詞
  conj — 聯合(conjunct)
   cop — 系動(copula) 雙指助動詞？？？？
    cc — 連接(coordination)，指中心詞與連詞 （開發，與）
其它
  attr — 屬性關係 （是，工程）
cordmod– 並列聯合動詞（coordinated verb compound） （頒佈，實行）
  mmod — 情態動詞（modal verb） （得到，能）
    ba — 把字關係
tclaus — 時間從句 （以後，積累）
       — semantic dependent
   cpm — 補語化成分（complementizer），一般指“的”引導的CP （振興，的）

參考網址：

stanf官網：https://stanfordnlp.github.io/CoreNLP/index.html#human-languages-supported

stanza官網：https://stanfordnlp.github.io/stanza/index.html

網絡資源：http://www.52nlp.cn/tag/corenlp

Stanford CoreNLP 純Python版本的深度學習NLP工具包 stanza 使用筆記

1、安裝

2、下載模型 stanza_resources

3、使用

以下爲標註解釋：來源網絡，侵權刪

詞性和實體標註解釋

句法關係標註解釋：

參考網址：

SQL優化-20231016

neo4j 增刪改查及索引操作實戰

記錄一下scrapy中 settings 的一些配置

linux 下 crontab -e 中/bin/sh: python3: 未找到命令

Docker Failed to start Docker Application Container Engine.

Python基礎編碼規範

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結