lucene-wiki翻譯：如何提高索引速度-3

原創

cfan_haifeng

2020-06-22 08:27

原文：http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
導航：Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.1BasicsOfPerformance-》1.1.1.4 ImproveIndexingSpeed
注意：“ 紅色 ”，表示不知道、不確定怎麼翻譯。 “ 藍色”自己的描述。
狀態：完成

8.以同樣的順序在Document中添加fields

原文寫道

Always add fields in the same order to your Document, when using stored fields or term vectors

Lucene's merging has an optimization whereby stored fields and term vectors can be bulk-byte-copied, but the optimization only applies if the field name -> number mapping is the same across segments. Future Lucene versions may attempt to assign the same mapping automatically (see LUCENE-1737), but until then the only way to get the same mapping is to always add the same fields in the same order to each document you index.

以同樣的順序在Document中添加fields，大家平時就是這麼做的。Lucene在合併索引的時候有一個優化功能，即可以根據field和term vectors實現批量字節拷貝，但該優化只有在 name->number映射在所有segments都相同的情況下方可實現。未來的lucene版本可能將會實現自動映射（參看），但目前爲止，只有“以同樣的順序在Document中添加fields”這一種方式來獲得一樣的映射。

9.在分析器Analyzers 中複用（單例模式）Token 實例

原文寫道

Re-use a single Token instance in your analyzer Analyzers often create a new Token for each termin sequence that needs to be indexed from a Field. You can save substantial GC cost by re-using a single Token instance instead.

在分析器Analyzers 中複用（單例模式）Token 實例。對於需要建立索引的Field，分析器Analyzers會爲其中的沒個term創建一個Token 對象。你可以通過複用Token來降低垃圾回收的消耗。

人家的翻譯：

在你的分析器Analyzer中使用一個單一的Token實例。在分析器中共享一個單一的token實例也將緩解GC的壓力。

悲哀啊，我到現在沒用過token，啥情況？？？

10.用Tokenz中的char[] API代替String API來表示數據

原文寫道

Use the char[] API in Token instead of the String API to represent token Text

As of Lucene 2.3, a Token can represent its text as a slice into a char array, which saves the GC cost of new'ing and then reclaiming（回收） String instances. By re-using a single Token instance and using the char[] API you can avoid new'ing any objects for each term. See Token for details.

represent ...as : 把…描繪成。暈，這都忘記了。

人家的翻譯

在Lucene 2.3中，Token可以使用char數組來表示他的數據。這樣可以避免構建字符串以及GC回收字符串的消耗。通過配合使用單一Token實例和使用char[]接口你可以避免創建新的對象。更多細節參考：Token

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

lucene-wiki翻譯：如何提高索引速度-3

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

lucene-wiki翻譯：如何提高索引速度-3

lucene-相關概念與定義

SWFUploadv.2.2.0上傳-上傳行爲（動作）控制

nginx: [emerg] "proxy_cache_path" directive is not allowed here..

×××貸款工具

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結