lucene-相關概念與定義

原創

cfan_haifeng

2020-06-22 08:27

原文： http://wiki.apache.org/lucene-java/ConceptsAndDefinitions
導航：Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.2 ConceptsAndDefinitions
注意：“ 紅色 ”，表示不知道、不確定怎麼翻譯。 “ 藍色”自己的描述。

這裏主要描述了一些Lucene的相關概念和定義

定義

Analyzer - 用於在分析文本，英語和拉丁語系通常用StandardAnalyzer 。編制索引的文本Lucene的類。大多數應用程序可以使用英語和拉丁語的語言StandardAnalyzer。

Payloads(有效載荷) - payload 是一個字節數組（array of bytes），用於存儲term的位置。

Snowball Stemmers(雪球詞幹分析器 ) --Snowball Stemmers是lucene引入的詞幹分析器之一。更多信息請參看 nowball website 。

Stemmer （詞幹分析器） - 以下解釋來自於維基：“這種算法用來降低干擾詞、同義詞的影響……，以用於降低索引大小……” 。這一段原文如下：

"A stemming algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form." Stemmers are often used to reduce the search space and index size. Often times a user searching for "widgets" is interested in documents that contain the term "widget".

核心類

Document

A Lucene Document is a record in the index. A Document has a list of fields; each field has a name and a textual value.

Term

A Term is Lucene's unit of indexing. In western languages, a Term is often a word.

TermEnum

TermEnum 通常用於統計某個field中的term個數，但不考慮這些term出現在哪個document中。

一些查詢子類就是通過對比terms 來實現查詢的，例如： WildcardQuery,PrefixQuery, RangeQuery.

原文

TermEnum is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).

Some query subclasses are implemented by enumerating terms that match a pattern, and building a large OR query from the enumeration. E.g. WildcardQuery,PrefixQuery, RangeQuery.

See LuceneFAQ, How do I retrieve all the values of a particular field that exists within an index, across all documents? which also includes sample code.

TermDocs

不像TermEnum (see above), TermDocs 通常用於確定哪些文檔包括給定的Term。另外，TermDocs 也提供了term 在文檔中出現的頻率。

TermFreqVector

A TermFreqVector (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the IndexReader only when Term Vectors are stored during indexing.

TermFreqVector 是一個包含 given Document's term 和**的數據結構。

原文

IndexReader

IndexSearcher

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

lucene-相關概念與定義

定義

核心類

Document

Term

TermEnum

TermDocs

TermFreqVector

Directory

IndexReader

IndexSearcher

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

本地SSL證書過期輸入命令在IIS自動生成

lucene-wiki翻譯：如何提高索引速度-3

lucene-相關概念與定義

SWFUploadv.2.2.0上傳-上傳行爲（動作）控制

nginx: [emerg] "proxy_cache_path" directive is not allowed here..

×××貸款工具

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結