[Solr源碼分析]LRUCache和FastLRUCache實現分析

Posted in cache, solr on 八月 9th, 2010 by kafka0102

在 [Solr 實踐]Solr Cache使用介紹及分析一文我有對Solr的LRUCache和FastLRUCache做了一些介紹，本文在此基礎對其實現做些補充。

1、LRUCache的實現分析

在分析LRUCache前先對LinkedHashMap做些介紹。LinkedHashMap繼承於HashMap，它使用了一個雙向鏈表來存儲Map中的Entry順序關係，這種順序有兩種，一種是LRU順序，一種是插入順序，這可以由其構造函數public LinkedHashMap(int initialCapacity,float loadFactor, boolean accessOrder)指定。所以，對於get、put、remove等操作，LinkedHashMap除了要做HashMap做的事情，還做些調整Entry順序鏈表的工作。
以get操作爲例，如果是LRU順序（accessOrder爲true），Entry的recordAccess方法就調整get到的Entry到鏈表的頭部去：

   public V get(Object key) {
        Entry<K,V> e = (Entry<K,V>)getEntry(key);
        if (e == null)
            return null;
        e.recordAccess(this);
        return e.value;
    }

對於put來說，LinkedHashMap重寫了addEntry方法：

   void addEntry(int hash, K key, V value, int bucketIndex) {
        createEntry(hash, key, value, bucketIndex);
        // Remove eldest entry if instructed, else grow capacity if appropriate
        Entry<K,V> eldest = header.after;
        if (removeEldestEntry(eldest)) {
            removeEntryForKey(eldest.key);
        } else {
            if (size >= threshold)
                resize(2 * table.length);
        }
    }

addEntry中調用了boolean removeEldestEntry(Map.Entry eldest)方法，默認實現一直返回false，也就是默認的Map是沒有容量限制的。LinkedHashMap的子類可以複寫該方法，噹噹前的size大於閾值時返回true，這樣LinkedHashMap就可以從Entry順序鏈表中刪除最舊的Entry。這使得LinkedHashMap具有了Cache的功能，可以存儲限量的元素，並具有兩種可選的元素淘汰策略（LRU和FIFO），其中的LRU是最常用的。
Solr的LRUCache是基於LinkedHashMap實現的，所以LRUCache的實現真的很簡單，這裏列出其中核心的代碼片斷：

    public Object init(final Map args, Object persistence, final CacheRegenerator regenerator) {
	//一堆解析參數參數初始化的代碼
	//map map    
    map = new LinkedHashMap(initialSize, 0.75f, true) {
      @Override
      protected boolean removeEldestEntry(final Map.Entry eldest) {
        if (size() > limit) {
          // increment evictions regardless of state.
          // this doesn't need to be synchronized because it will
          // only be called in the context of a higher level synchronized block.
          evictions++;
          stats.evictions.incrementAndGet();
          return true;
        }
        return false;
      }
    };
    if (persistence==null) {
      // must be the first time a cache of this type is being created
      persistence = new CumulativeStats();
    }
    stats = (CumulativeStats)persistence;
    return persistence;
  }
 
    public Object put(final Object key, final Object value) {
    synchronized (map) {
      if (state == State.LIVE) {
        stats.inserts.incrementAndGet();
      }
      // increment local inserts regardless of state???
      // it does make it more consistent with the current size...
      inserts++;
      return map.put(key,value);
    }
  }
 
  public Object get(final Object key) {
    synchronized (map) {
      final Object val = map.get(key);
      if (state == State.LIVE) {
        // only increment lookups and hits if we are live.
        lookups++;
        stats.lookups.incrementAndGet();
        if (val!=null) {
          hits++;
          stats.hits.incrementAndGet();
        }
      }
      return val;
    }
  }

可以看到，LRUCache對讀寫操作直接加的互斥鎖，多線程併發讀寫時會有鎖的競爭問題。通常來說，Cache系統的讀要遠多於寫，不能併發讀是有些不夠友好。不過，相比於Solr中其它耗時的操作來說，LRUCache的串行化讀往往不會成爲系統的瓶頸。LRUCache的優點是，直接套用LinkedHashMap，實現簡單，缺點是，因爲LinkedHashMap的get操作需要操作Entry順序鏈表，所以必須對整個操作加鎖。

2、FastLRUCache的實現分析

Solr1.4引入FastLRUCache作爲另一種可選的實現。FastLRUCache放棄了LinkedHashMap，而是使用現在很多Java Cache實現中使用的ConcurrentHashMap。但ConcurrentHashMap只提供了高性能的併發存取支持，並沒有提供對淘汰數據的支持，所以FastLRUCache主要需要做的就是這件事情。FastLRUCache的存取操作都在ConcurrentLRUCache中實現，所以我們直接過渡到ConcurrentLRUCache的實現。
ConcurrentLRUCache的存取操作代碼如下：

  public V get(final K key) {
    final CacheEntry<K,V> e = map.get(key);
    if (e == null) {
      if (islive) {
        stats.missCounter.incrementAndGet();
      }
      return null;
    }
    if (islive) {
      e.lastAccessed = stats.accessCounter.incrementAndGet();
    }
    return e.value;
  }
 
  public V remove(final K key) {
    final CacheEntry<K,V> cacheEntry = map.remove(key);
    if (cacheEntry != null) {
      stats.size.decrementAndGet();
      return cacheEntry.value;
    }
    return null;
  }
 
  public Object put(final K key, final V val) {
    if (val == null) {
      return null;
    }
    final CacheEntry e = new CacheEntry(key, val, stats.accessCounter.incrementAndGet());
    final CacheEntry oldCacheEntry = map.put(key, e);
    int currentSize;
    if (oldCacheEntry == null) {
      currentSize = stats.size.incrementAndGet();
    } else {
      currentSize = stats.size.get();
    }
    if (islive) {
      stats.putCounter.incrementAndGet();
    } else {
      stats.nonLivePutCounter.incrementAndGet();
    }
 
    // Check if we need to clear out old entries from the cache.
    // isCleaning variable is checked instead of markAndSweepLock.isLocked()
    // for performance because every put invokation will check until
    // the size is back to an acceptable level.
    // There is a race between the check and the call to markAndSweep, but
    // it's unimportant because markAndSweep actually aquires the lock or returns if it can't.
    // Thread safety note: isCleaning read is piggybacked (comes after) other volatile reads
    // in this method.
    if (currentSize > upperWaterMark && !isCleaning) {
      if (newThreadForCleanup) {
        new Thread() {
          @Override
          public void run() {
            markAndSweep();
          }
        }.start();
      } else if (cleanupThread != null){
        cleanupThread.wakeThread();
      } else {
        markAndSweep();
      }
    }
    return oldCacheEntry == null ? null : oldCacheEntry.value;
  }

所有的操作都是直接調用map（ConcurrentHashMap）的。看下put中的代碼，當map容量達到上限並且沒有其他線程在清理數據（currentSize > upperWaterMark && !isCleaning），就調用markAndSweep方法清理數據，可以有3種方式做清理工作：1）在該線程同步執行，2）即時啓動新線程異步執行，3）提供單獨的清理線程，即時喚醒它異步執行。

markAndSweep方法那是相當的冗長，這裏就不羅列出來。下面敘述下它的思路。

對於ConcurrentLRUCache中的每一個元素CacheEntry，它有個屬性lastAccessed，表示最後訪問的數值大小。ConcurrentLRUCache中的stats.accessCounter是全局的自增整數，當put或get Entry時，Entry的lastAccessed會被更新成新自增得到的accessCounter。 ConcurrentLRUCache淘汰數據就是淘汰那些lastAccessed較小的Entry。因爲ConcurrentLRUCache沒有維護以lastAccessed排序的Entry鏈表（否則就是LRUCache了），所以淘汰數據時就需要遍歷整個Map中的元素來淘汰合適的Entry。這是不是要扯上排序呢？其實不用那麼大動干戈。

這裏定義幾個變量，wantToKeep表示Map中需要保留的Entry個數，wantToRemove表示需要刪除的個數（wantToRemove=map.size-wantToKeep),newestEntry是最大的lastAccessed值（初始是stats.accessCounter），這三個變量初始都是已知的，oldestEntry表示最小的lastAccessed，這個是未知的，可以在遍歷Entry時通過比較遞進到最小。Map中的Entry有3種:(a)是可以立刻判斷出可以被淘汰的，也就是lastAccessed<(oldestEntry+wantToRemove)的，（b）是可以立刻判斷出可以被保留的，也就是lastAccessed>(newestEntry-1000)的，（c）除上述兩者之外的就是不能準確判斷是否需要被淘汰的。對於遍歷一趟Map中的Entry來說，極好的情況是如果淘汰掉滿足（a）的Entry後Map大小降到了wantToKeep，這種情況的典型代表是對Cache只有get和put操作，使得lastAccessed在Map中能保持連續；極壞的情況是，可能滿足（a）的Entry不夠多甚至沒有。但遍歷一趟Map至少有一個效果是，會把需要處理的Entry範圍縮小到滿足（c）的。如此反覆迭代，一定使得Map容量調到wantToKeep。而對這個淘汰，也要考慮一個現實情況是，wantToKeep往往是接近於map.size（比如等於0.9*map.size）的，如果remove操作不是很多，那麼並不需要很多次遍歷就可以完成清理工作。

ConcurrentLRUCache淘汰數據的基本思想如上所述。它的執行過程可以分爲3個階段。第一個階段就是遍歷Map中的每個Entry，如果滿足（a）就remove，滿足（b）則跳過，滿足（c）則放到新map中。一遍下來後，如果map.size還大於wantToKeep，第二個階段就再重複上述過程（實現上，Solr用了個變量numPasses，似乎想做個開關控制遍歷幾次，當前就固定成一次）。完了如果map.size還大於wantToKeep，第三階段再遍歷一遍Map，但這次使用PriorityQueue來提取出還需要再淘汰的N個最old的Entry，這樣一次下來就收工了。需要補充一點，上面提到的wantToKeep在代碼中是acceptableWaterMark和lowerWaterMark，也就是如果遍歷後達到acceptableWaterMark就算完成，但操作是按lowerWaterMark的要求來。

這個算法的時間複雜度是2n+kln(k)（k值在實際大多數情況下會很小），相比於直接的堆排，通常會更快些。

3、總結

LRUCache和FastLRUCache兩種Cache實現是兩種很不同的思路。兩者的相同點是，都使用了現成的Map來維護數據。不同點是如何來淘汰數據。LRUCache（也就是LinkedHashMap）格外維護了一個結構，在做存取操作時同時更新該結構，優點在於淘汰操作是O(1)的，缺點是需要對存取操作加互斥鎖。FastLRUCache正相反，它沒有額外維護新的結構，可以由ConcurrentHashMap支持併發讀，但put操作中如果需要淘汰數據，淘汰過程是O(n)的，因爲整個過程不加鎖，這也只會影響該次put的性能，而FastLRUCache也可選成起獨立線程異步執行來降低影響。而另一個Cache實現Ehcache，它在淘汰數據就是同步的，不過它限定了每次淘汰數據的大小（通常都少於5個），所以同步情況下性能不會太受影響。

=============================== 華麗的終止符 ================================

[Solr源碼分析]LRUCache和FastLRUCache實現分析

[Solr源碼分析]LRUCache和FastLRUCache實現分析

1、LRUCache的實現分析

2、FastLRUCache的實現分析

3、總結

985 碩士程序員，空窗 4 個月沒有 Offer！

【入門教程】5分鐘教你快速學會集成Java springboot ~

營銷系統黑名單優化：位圖的應用解析

一文搞懂 Spring 循環依賴

我真的從測試轉成了開發......

盛大發布 | Zabbix 7.0 LTS--性能與擴展的卓越融合

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

symbian 3rd內存泄露查找(轉帖)

哪種OS更適合高性能網絡應用

聖彼得

察看軟件的編寫語言

new運算符的幾種優化

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結