Lucene：分析Document源碼

原創

你是小KS

2020-06-19 09:50

1.聲明

當前內容主要用於本人學習和複習，當前內容爲Document源碼分析

由於前面測試和使用了Lucene實現了增刪查改，發現當前所使用的模型就是Document,所以決定分析以下該類主要的數據模型和方式，以便後面的理解

2.查看Document類繼承和實現

public final class Document implements Iterable<IndexableField>

1.發現該類是最終類，所以無法繼承和重寫其中的屬性和內容

2.發現該類只實現了一個索引字段迭代的功能

3.查看Document類的構造函數

 public Document() {}

一個非常簡單的無參構造函數

4.查看Document類的屬性

private final List<IndexableField> fields = new ArrayList<>();
private final static String[] NO_STRINGS = new String[0];

1.當前的Document中的所有的索引字段使用List集合來維護

2.由於使用了List集合，說明當前的索引字段可以重複可以具有多個相同的IndexFiled

5.查看Document類中的方法

1.添加索引字段方法：add(IndexableField)

public final void add(IndexableField field) {
   fields.add(field);
 }

直接在集合中添加就完事了(說明Document實際上就是一堆IndexableFiled的集合,並且該集合中的元素可重複)

2.刪除索引字段

// 直接迭代按照索引字段的名稱方式比較，然後刪除第一個匹配的元素
public final void removeField(String name) {
  Iterator<IndexableField> it = fields.iterator();
  while (it.hasNext()) {
    IndexableField field = it.next();
    if (field.name().equals(name)) {
      it.remove();
      return;
    }
  }
}

// 直接迭代按照索引字段的名稱方式比較，刪除所有匹配的字段
public final void removeFields(String name) {
  Iterator<IndexableField> it = fields.iterator();
  while (it.hasNext()) {
    IndexableField field = it.next();
    if (field.name().equals(name)) {
      it.remove();
    }
  }
}

就是一個按照IndexableField的名稱進行迭代然後刪除，只是一個是刪除第一個元素，而後面的removeFields是刪除所有匹配的元素

3.通過索引字段名稱獲取對應的二進制值

//實際上就是迭代，按照名稱方式獲取所有匹配的BytesRef值,主要通過binaryValue方法獲取
public final BytesRef[] getBinaryValues(String name) {
  final List<BytesRef> result = new ArrayList<>();
  for (IndexableField field : fields) {
    if (field.name().equals(name)) {
      final BytesRef bytes = field.binaryValue();
      if (bytes != null) {
        result.add(bytes);
      }
    }
  }

  return result.toArray(new BytesRef[result.size()]);
}

//實際上就是迭代，按照名稱方式獲取第一個匹配的BytesRef值,主要通過binaryValue方法獲取
public final BytesRef getBinaryValue(String name) {
  for (IndexableField field : fields) {
    if (field.name().equals(name)) {
      final BytesRef bytes = field.binaryValue();
      if (bytes != null) {
        return bytes;
      }
    }
  }
  return null;
}

發現這個就是按照名稱迭代，獲取其binaryValue的值，其結果類型爲BytesRef

4.通過名稱獲取索引字段

// 直接獲取第一個匹配的索引字段
 public final IndexableField getField(String name) {
    for (IndexableField field : fields) {
      if (field.name().equals(name)) {
        return field;
      }
    }
    return null;
  }

// 獲取所有匹配的索引字段
  public IndexableField[] getFields(String name) {
    List<IndexableField> result = new ArrayList<>();
    for (IndexableField field : fields) {
      if (field.name().equals(name)) {
        result.add(field);
      }
    }

    return result.toArray(new IndexableField[result.size()]);
  }

就是按照名稱在List集合中找到匹配的IndexableField

5.查看getFileds方法

 public final List<IndexableField> getFields() {
   return Collections.unmodifiableList(fields);
 }

表示返回一個不可修改的索引字段的集合

6.通過字段名稱獲取String類型的值

// 實際就是按照名稱匹配，並直接獲取所有匹配stringValue
public final String[] getValues(String name) {
  List<String> result = new ArrayList<>();
  for (IndexableField field : fields) {
    if (field.name().equals(name) && field.stringValue() != null) {
      result.add(field.stringValue());
    }
  }
  
  if (result.size() == 0) {
    return NO_STRINGS;
  }
  
  return result.toArray(new String[result.size()]);
}

// 實際就是按照名稱匹配，並直接獲取第一個匹配stringValue
public final String get(String name) {
  for (IndexableField field : fields) {
    if (field.name().equals(name) && field.stringValue() != null) {
      return field.stringValue();
    }
  }
  return null;
}

就是返回調用的IndexableField的stringValue方法獲取String類型的值

7.查看clear方法

public void clear() {
   fields.clear();
 }

直接就是List的清空方法

6.總結

1.當前的Lucene中的Document實際上就是IndexableField的集合，並且該集合中的元素是可以重複的(那麼查找就是可以通過同名的索引一起查詢)

2.IndexableField具有BytesRef類型的數據和String類型的數據，通過使用不同的方法獲取

3.Document的所有的獲取方式都是按照IndexableField的名稱方式獲取(實際就是迭代獲取的)

4.Document的刪除也是按照名稱匹配方式實現的

以上純屬個人見解，如有問題請聯本人！

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Lucene：分析Document源碼

1.聲明

2.查看Document類繼承和實現

3.查看Document類的構造函數

4.查看Document類的屬性

5.查看Document類中的方法

6.總結

《日本蠟燭圖》讀書筆記 & 技術分析回測

《期貨-市場技術分析》讀書筆記

Python多線程編程深度探索：從入門到實戰

mongodb處理json數據很好

[轉帖]cpupower

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

Lucene：分析Document源碼

Zookeeper：使用java方式操作Zookeeper(2)之模擬服務發佈和獲取

HBase：模擬概念視圖表

Zookeeper：實現隊列節點

Lucene：使用中文分析器

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結