FileSwitchDirectory 學習筆記並在solr上的使用

原創

源远流长

2020-02-22 09:19

FileSwitchDirectory實現原理與應用

FileSwitchDirectory是lucene的另一種Directory實現類，從名字個就可以理解爲文件切換的Directory實現，

的確是針對lucene的不同的索引文件使用不同的Directory .藉助FileSwitchDirectory整合不同的Directory實現類的優點於一身。

比如MMapDirectory,藉助內存映射文件方式提高性能，但又要減少內存切換的可能，當索引太大的時候，內存映射也需要不斷地切換，這樣優點也可能變缺點，而之前的NIOFSDirectory實現java NIO的方式提高高併發性能，但又因高併發也會導致IO過多的影響，所以這次可以藉助FileSwitchDirectory發揮他們兩的優點。

MMapDirectory與NIOFSDirectory的實現差別。

NIOFSDirectory----只是使用了直接內存讀取文件緩存方式

@Override
protected void newBuffer(byte[] newBuffer) {
super.newBuffer(newBuffer);
byteBuf = ByteBuffer.wrap(newBuffer);
}

MMapDirectory------使用MMap技術映射文件，默認會映射1G的內存（64位）或者256m（32位系統））

MMapDiretory就是將文件映射到內存中。。使用的是MMap技術
this.buffers[bufNr] = rafc.map(MapMode.READ_ONLY, bufferStart, bufSize);

首先將索引目錄裏佔比例比較小的文件使用MMapDirectory，這樣幾乎可以全部映射到內存裏了。。而佔有大比例的文檔存儲文件交由於NIOFSDirectory方式讀取。

這個結合不錯呀。。

FileSwitchDirectory實現代碼解析

FileSwitchDirectory的代碼很簡單，因爲可以理解爲它就是一個Dao的入口也是個控制器，所以它並沒有具體的文件操縱實現。

先了解它的構造是：

  public FileSwitchDirectory(Set<String> primaryExtensions, Directory primaryDir, Directory secondaryDir, boolean doClose) {
    this.primaryExtensions = primaryExtensions;
    this.primaryDir = primaryDir;
    this.secondaryDir = secondaryDir;
    this.doClose = doClose;
    this.lockFactory = primaryDir.getLockFactory();
  }

首先是文件後綴的集合參數

主要的Directory

次要的Directory

是否關閉的時候調用

所以都是調用對應的Directory獲得IndexInput 與IndexOuput

@Override
  public IndexInput openInput(String name) throws IOException {
    return getDirectory(name).openInput(name);
  }

 @Override
  public IndexOutput createOutput(String name) throws IOException {
    return getDirectory(name).createOutput(name);
  }

通過文件名字取到對應的Directory

  private Directory getDirectory(String name) {
    String ext = getExtension(name);
    if (primaryExtensions.contains(ext)) {
      return primaryDir;
    } else {
      return secondaryDir;
    }
  }

solr使用的DirectoryFactory實現

/**
 * 
 * 
 * 支持某些後綴文件不作映射優化，比如去掉fdt,fdx
 * 
 * 
 * 
 *   
 <directoryFactory class="solr.MMapDirectoryFactoryExt">
    <str name="unmap">true</str>
    <lst name="filetypes">
       <bool name="fdt">false</bool>
       <bool name="fdx">false</bool>
   </lst>
 </directoryFactory>
 *
 */
public class MMapDirectoryFactoryExt extends DirectoryFactory {
	// filetypes不作映射
	private Set<String> nonMappedFiles = new HashSet<String>();
	// 是否使用不映射選擇
	private Boolean useUnmapHack = false;

	public Directory open(String path) throws IOException {
		MMapDirectory mmapDir = new MMapDirectory(new File(path));
		mmapDir.setUseUnmap(useUnmapHack);
		return new FileSwitchDirectory(nonMappedFiles, mmapDir, FSDirectory.open(new File(path)), true);
	}

	public void init(NamedList args) {
		Object unmap, namedlist;
		nonMappedFiles = new HashSet<String>();
		if ((unmap = args.get("unmap")) instanceof Boolean)
			useUnmapHack = (Boolean) unmap;
		if ((namedlist = args.get("filetypes")) instanceof NamedList) {
			NamedList filetypes = (NamedList) namedlist;
			for (String type : IndexFileNames.INDEX_EXTENSIONS) {
				Object mapped = filetypes.get(type);
				if (Boolean.FALSE.equals(mapped))
					nonMappedFiles.add(type);
			}
		}
	}
}

solrconfig.xml上的配置，使用於新的DirectoryFactory

<directoryFactory class="solr.MMapDirectoryFactory">
<str name="unmap">true</str>
<lst name="filetypes">
<bool name="fdt">false</bool>
<bool name="tii">false</bool>
</lst>
</directoryFactory>

線上的索引文件大小：

7.3G ./_y8b.fdt

201M ./_y8b.fdx

4.0K ./_y8b.fnm

1.8G ./_y8b.frq

76M ./_y8b.nrm

537M ./_y8b.prx

7.1M ./_y8b.tii

571M ./_y8b.tis

4.0K ./segments.gen

4.0K ./segments_1p

由於tii文件會加載到內存，所以這個不須要映射，fdt文件太大，主要是正向存儲的數據，可以使用NiOFSDirectory方式

還有一個文件frq文件好大，這個也是需要考慮的。

  public final void setMaxChunkSize(final int maxChunkSize) {
    if (maxChunkSize <= 0)
      throw new IllegalArgumentException("Maximum chunk size for mmap must be >0");
    //System.out.println("Requested chunk size: "+maxChunkSize);
    this.chunkSizePower = 31 - Integer.numberOfLeadingZeros(maxChunkSize);
    assert this.chunkSizePower >= 0 && this.chunkSizePower <= 30;
    //System.out.println("Got chunk size: "+getMaxChunkSize());
  }

從上面的代碼可以看出，最大也只能是1G大小。。。杯具。。

轉載請寫明引用：FileSwitchDirectory 學習筆記並在solr上的使用

源遠流長

發佈了94 篇原創文章 · 獲贊 77 · 訪問量 60萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

FileSwitchDirectory 學習筆記並在solr上的使用

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

solrCloud 選舉leader的bug

荔枝UGC內容推薦系統的探索與實踐

FileSwitchDirectory 學習筆記並在solr上的使用

solr 4.0已正式發佈了

SIFT算法學習心得

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

FileSwitchDirectory 學習筆記 並在solr上的使用

FileSwitchDirectory 學習筆記並在solr上的使用