如何訓練

初始化模型權重使其變成隨機值：調用nlp.begin_training方法；
查看當前權重的表現：調用nlp.update方法
比較預測結果和真實的標籤；
計算如何調整權重來改善預測結果；
微調模型權重；
重複上述步驟；
循環訓練：

for i in range(10):
	random.shuffle(TRAINING_DATA)
	for batch in spacy.util.minibatch(TRAINING_DATA):
	texts = [text for text, annoation in batch]
	annotations = [annotation for text, annotation in batch]
	nlp.update(texts, annotations)
nlp.to_disk(path_to_model)

訓練一個新的模型：

nlp = spacy.blank("zh")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
ner.add_label("GADGET")
nlp.begin_training()
for itn in range(10):
	random.shuffle(examples)
	for batch in spacy.util.minibatch(examples, size=2):
		texts = [text for text, annoation in batch]
		annotations = [annotation for text, annotation in batch]
		nlp.update(texts, annotations)

模型訓練會出現的問題：

將之前的正確預測結果混合進來

TRAINING_DATA = [
("...", {"entities": [(0,1, "WEBSITE")]}),
("...", {"entities": [(0,1, "PERSON")]})
]

模型不能學會所有的東西
選擇那些能從本地語境中反映出類別的類型；
更通用的標籤要好過更特定的標籤；
可以用規則將通用標籤轉換爲特定種類；

LABELS = ["CLOTHING", "BAND"]

手動標註工具：

Brat
Prodigy

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

spacy訓練模型和更新

如何訓練

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

HTML 00 Tutorial

全面系統的AI學習路徑，幫助普通人也能玩轉AI

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

鼎陽SDS6204示波器的EPICS IOC調試

C語言學習筆記第二天

NIOS II 自定義IP核的靜態地址對齊和動態地址對齊

CabloyJS v4.0.0 支持工作流引擎及更多 🎉

1 2 3 轉換成一百二十三，十萬位以內的轉換

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結