- 原文: http://wiki.apache.org/lucene-java/ConceptsAndDefinitions
- 導航:Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.2 ConceptsAndDefinitions
- 注意:“ 紅色 ”,表示不知道、不確定怎麼翻譯。 “ 藍色”自己的描述。
這裏主要描述了一些Lucene的相關概念和定義
定義
Analyzer - 用於在分析文本,英語和拉丁語系通常用StandardAnalyzer 。編制索引的文本Lucene的類。大多數應用程序可以使用英語和拉丁語的語言StandardAnalyzer。
Payloads(有效載荷) - payload 是一個字節數組(array of bytes),用於存儲term的位置。
Snowball Stemmers(雪球詞幹分析器 ) --Snowball Stemmers是lucene引入的詞幹分析器之一。 更多信息請參看 nowball website 。
Stemmer (詞幹分析器) - 以下解釋來自於維基:“這種算法用來降低干擾詞、同義詞的影響……,以用於降低索引大小……” 。這一段原文如下:
核心類
Document
A Lucene Document is a record in the index. A Document has a list of fields; each field has a name and a textual value.
Term
A Term is Lucene's unit of indexing. In western languages, a Term is often a word.
TermEnum
TermEnum 通常用於統計某個field中的term個數,但不考慮這些term出現在哪個document中。
一些查詢子類就是通過對比terms 來實現查詢的,例如: WildcardQuery,PrefixQuery, RangeQuery.
TermEnum is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).
Some query subclasses are implemented by enumerating terms that match a pattern, and building a large OR query from the enumeration. E.g. WildcardQuery,PrefixQuery, RangeQuery.
See LuceneFAQ, How do I retrieve all the values of a particular field that exists within an index, across all documents? which also includes sample code.
TermDocs
不像TermEnum (see above), TermDocs 通常用於確定哪些文檔包括給定的Term。另外,TermDocs 也提供了term 在文檔中出現的頻率。
TermFreqVector
A TermFreqVector (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the IndexReader only when Term Vectors are stored during indexing.
TermFreqVector 是一個包含 given Document's term 和**的數據結構。
Directory
IndexReader
IndexSearcher