在併發量比較大的場景,如果採用直接訪問數據庫的方式,將會對數據庫帶來巨大的壓力,嚴重的情況下可能會導致數據庫不可用狀態,並且時間的消耗也是不能容忍的,尤其對於某些獲取起來比較昂貴的數據。在這種情況下,一般採用緩存的方式。將經常訪問的熱點數據提前加載到內存中,這樣能夠大大降低數據庫的壓力。
OSCache是一個開源的緩存框架,雖然現在已經停止維護了,但是對於OSCache的實現還是值得學習和借鑑的。下面通過OSCache的部分源碼分析OSCache的設計思想。
緩存數據結構
通常緩存都是通過<K,V>這種數據結構存儲,但緩存都是應用在多線程的場景下,需要保證線程安全。Java中可以選擇HashTable、ConcurrentHashMap、synchronizedMap等。OSCache本質上使用的HashTable實現的,具體實現代碼在com.opensymphony.oscache.base.algorithm.AbstractConcurrentReadCache中。由於HashTable在保證線程安全上採用的加鎖整個數據塊,因此,適合讀多寫少(mostly-concurrent reading, but exclusive writing)的場景(OSCache對原始的HashTable進行了優化,下面會講到)。如果寫很多的話,可以採用ConcurrentHashMap數據結構。
獲得緩存內容
//從緩存中獲取指定key對應的內容 public Object getFromCache(String key, int refreshPeriod, String cronExpiry) throws NeedsRefreshException { //首先嚐試獲取內容,如果獲取不到,則新建一個CacheEntry對象 CacheEntry cacheEntry = this.getCacheEntry(key, null, null); Object content = cacheEntry.getContent(); CacheMapAccessEventType accessEventType = CacheMapAccessEventType.HIT; boolean reload = false; // 檢查緩存是否過期,如果過期分爲以下幾種情況處理 if (this.isStale(cacheEntry, refreshPeriod, cronExpiry)) { //獲取更新狀態,如果沒有,新建一個同時引用計數默認爲1 EntryUpdateState updateState = getUpdateState(key); try { synchronized (updateState) { if (updateState.isAwaitingUpdate() || updateState.isCancelled()) { // 如果狀態爲等待刷新或者已經取消刷新,說明當前沒有其他線程對其進行刷新操作 // 因此這裏啓動刷新操作(這裏會將狀態更新爲刷新中同時引用計數+1) updateState.startUpdate(); //如果是新建的CacheEntry對象(即之前未緩存該key對應的對象) if (cacheEntry.isNew()) { //設置命中狀態爲未命中 accessEventType = CacheMapAccessEventType.MISS; } else { //否則說明雖然命中了,但是需要刷新 accessEventType = CacheMapAccessEventType.STALE_HIT; } } else if (updateState.isUpdating()) { // 如果更新狀態爲刷新中,說明另有一個線程正對該緩存對象執行刷新操作 // 此時如果是新建的CacheEntry對象或者同步模式設置爲true,那麼該線程將阻塞 //通過putInCache或者cancelUpdate可以讓線程繼續運行 // 否則獲取到的很有可能是髒數據 if (cacheEntry.isNew() || blocking) { do { try { updateState.wait(); } catch (InterruptedException e) { } } while (updateState.isUpdating()); //如果更新狀態變成了取消,說明另外一個線程取消了刷新緩存操作,那麼讓該線程嘗試刷新 if (updateState.isCancelled()) { //更新狀態設置爲更新中並將引用計數+1 updateState.startUpdate(); if (cacheEntry.isNew()) { accessEventType = CacheMapAccessEventType.MISS; } else { accessEventType = CacheMapAccessEventType.STALE_HIT; } } else if (updateState.isComplete()) { reload = true; } else { log.error("Invalid update state for cache entry " + key); } } } else { reload = true; } } } finally { //將引用計數-1同時檢查如果引用計數=0,將updateState移除 releaseUpdateState(updateState, key); } } // 如果該標誌位爲true,說明緩存一定刷新了 if (reload) { cacheEntry = (CacheEntry) cacheMap.get(key); if (cacheEntry != null) { content = cacheEntry.getContent(); } else { log.error("Could not reload cache entry after waiting for it to be rebuilt"); } } dispatchCacheMapAccessEvent(accessEventType, cacheEntry, null); // 如果緩存不存在或者緩存過期將拋出需要刷新的異常 if (accessEventType != CacheMapAccessEventType.HIT) { throw new NeedsRefreshException(content); } return content; }
從上面可以看到EntryUpdateState很關鍵。EntryUpdateState用來標記某個key對應的緩存的更新狀態以及線程引用計數(可以理解爲一個計數器),並且每一個key對應一個EntryUpdateState。如果緩存存在並且沒有過期,EntryUpdateState爲空。OSCache使用的HashTable相對於原始的HashTable在get操作中是沒有synchronize關鍵字的,而爲了防止併發問題,所以引入了EntryUpdateState這個數據結構。這樣做的目的就是防止過多的使用synchronize,從而對性能不會造成很大的影響。
查看定義如下:
//默認 public static final int NOT_YET_UPDATING = -1; public static final int UPDATE_IN_PROGRESS = 0; public static final int UPDATE_COMPLETE = 1; public static final int UPDATE_CANCELLED = 2; int state = NOT_YET_UPDATING; //引用計數 private int nbConcurrentUses = 1;
這裏的引用計數,代表了當前有多少線程在緩存更新或存入過程中進行訪問。
通過上面從緩存獲取指定key的代碼可以發現一個問題:當緩存不存在或者緩存過期的情況下,都會拋出NeedsRefreshException的異常,在這種情況下,如果blocking設置爲true(通常設置爲true),其他訪問的線程將處於阻塞狀態,直到緩存更新完畢纔會繼續運行,倘若這裏處理不當,將會導致死鎖的發生。
因此在該異常產生時,需要進行緩存的刷新操作,官方給出了兩種方法:
//第一種:with fail over String myKey = "myKey"; String myValue; int myRefreshPeriod = 1000; try { // Get from the cache myValue = (String) admin.getFromCache(myKey, myRefreshPeriod); } catch (NeedsRefreshException nre) { try { // Get the value (probably by calling an EJB) myValue = "This is the content retrieved."; // Store in the cache admin.putInCache(myKey, myValue); } catch (Exception ex) { // We have the current content if we want fail-over. myValue = (String) nre.getCacheContent(); // It is essential that cancelUpdate is called if the // cached content is not rebuilt admin.cancelUpdate(myKey); } }
//第二種:without fail over String myKey = "myKey"; String myValue; int myRefreshPeriod = 1000; try { // Get from the cache myValue = (String) admin.getFromCache(myKey, myRefreshPeriod); } catch (NeedsRefreshException nre) { try { // Get the value (probably by calling an EJB) myValue = "This is the content retrieved."; // Store in the cache admin.putInCache(myKey, myValue); updated = true; } finally { if (!updated) { // It is essential that cancelUpdate is called if the // cached content could not be rebuilt admin.cancelUpdate(myKey); } } }
正如之前的代碼,如果這裏不調用putInCache或者cancelUpdate,其他訪問該緩存的線程將會由於得不到資源始終處於阻塞狀態,導致死鎖的發生。因此這裏是一個非常重要的關注點。只有調用了putInCache或者cancelUpdate方法,阻塞的線程纔會開始運行。
下面看一下putInCache和cancelUpdate方法具體做了什麼:
public void putInCache(String key, Object content, String[] groups, EntryRefreshPolicy policy, String origin) { CacheEntry cacheEntry = this.getCacheEntry(key, policy, origin); boolean isNewEntry = cacheEntry.isNew(); // 首先判斷緩存中是否已經存在 if (!isNewEntry) { cacheEntry = new CacheEntry(key, policy); } cacheEntry.setContent(content); cacheEntry.setGroups(groups); cacheMap.put(key, cacheEntry); // 更新狀態及引用計數,通知其它阻塞線程可以獲取緩存了 completeUpdate(key); //...... //...... } } protected void completeUpdate(String key) { EntryUpdateState state; synchronized (updateStates) { state = (EntryUpdateState) updateStates.get(key); if (state != null) { synchronized (state) { //更新狀態爲UPDATE_COMPLETE,引用計數-1 int usageCounter = state.completeUpdate(); //喚醒其它等待該緩存資源的線程 state.notifyAll(); checkEntryStateUpdateUsage(key, state, usageCounter); } } else { //如果putInCache方法直接調用(如不是因NeedRefreshException異常調用)這樣EntryUpdateState將爲null,不執行操作 } } }
cancelUpdate的邏輯和putInCache基本相同:
public void cancelUpdate(String key) { EntryUpdateState state; if (key != null) { synchronized (updateStates) { state = (EntryUpdateState) updateStates.get(key); if (state != null) { synchronized (state) { //更新狀態爲UPDATE_CANCELLED,引用計數-1 int usageCounter = state.cancelUpdate(); state.notify(); checkEntryStateUpdateUsage(key, state, usageCounter); } } else { if (log.isErrorEnabled()) { log.error("internal error: expected to get a state from key [" + key + "]"); } } } } }
從上面的代碼可以看到,當發生需要刷新緩存(NeedsRefreshException)的異常時,需要通過putInCache()方法進行緩存的更新或者cancelUpdate()方法放棄刷新緩存,從而釋放資源,喚醒其它阻塞的線程。
緩存淘汰(替換)策略
因爲我們的內存不是無限的,緩存不可能無限的擴大,因此在緩存佔滿時,我們需要將緩存中一些“不重要”的內容剔除,從而騰出空間緩存新的內容。如何丈量這個“不重要”,就是我們需要考慮的緩存淘汰(替換)策略。
一般有以下策略:
Least Frequently Used(LFU):計算每個緩存對象的使用頻率,將頻率最低的剔除;
Least Recently User(LRU):最近最少使用,具體是將最近訪問的內容始終放在最頂端,一直未訪問或者最久未訪問的內容放在最底端,當需要替換的時候,只需將最底端的剔除即可,這樣可以使得最常訪問的內容始終在緩存中,使用比較廣泛,OSCache中默認也是採用該方法。LRU的這種特性,在Java中很容易通過LinkedHashMap實現,具體實現方法可以參考下面的介紹。
First in First out(FIFO):先進先出。實現起來最爲簡單,但是不適用。
Random Cache:隨機替換。
當然還有很多替換算法,這裏就不一一列舉了。僅就最常用的LRU算法進行介紹。
Java中的LinkedHashMap可以保持插入順序或者訪問順序,對於第二個特性,跟LRU的機制很相似,因此,可以很簡單的採用LinkedHashMap來實現LRU算法。
查看LinkedHashMap的定義,有下面一個參數:
final boolean accessOrder;
再看幾個構造函數的定義:
public LinkedHashMap(int initialCapacity, float loadFactor) { super(initialCapacity, loadFactor); accessOrder = false; } public LinkedHashMap(int initialCapacity) { super(initialCapacity); accessOrder = false; } public LinkedHashMap() { super(); accessOrder = false; } public LinkedHashMap(Map<? extends K, ? extends V> m) { super(); accessOrder = false; putMapEntries(m, false); } public LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) { super(initialCapacity, loadFactor); this.accessOrder = accessOrder; }
可以看到,除了最後一個構造函數,其餘的accessOrder默認爲false。當accessOrder爲false時,LinkedHashMap保持插入順序,而accessOrder如果爲true,將保持訪問順序,因此這正是關鍵點。具體如何保持插入順序或者訪問順序,可以參考LinkedHashMap的實現代碼,並不複雜。
僅僅是保持訪問順序還不行,我們還要淘汰最近最少使用的對象。LinkedHashMap重寫了父類HashMap的afterNodeInsertion方法:
void afterNodeInsertion(boolean evict) { // possibly remove eldest LinkedHashMap.Entry<K,V> first; if (evict && (first = head) != null && removeEldestEntry(first)) { K key = first.key; removeNode(hash(key), key, null, false, true); } } /** * Returns <tt>true</tt> if this map should remove its eldest entry. * This method is invoked by <tt>put</tt> and <tt>putAll</tt> after * inserting a new entry into the map. It provides the implementor * with the opportunity to remove the eldest entry each time a new one * is added. This is useful if the map represents a cache: it allows * the map to reduce memory consumption by deleting stale entries. * * <p>Sample use: this override will allow the map to grow up to 100 * entries and then delete the eldest entry each time a new entry is * added, maintaining a steady state of 100 entries. * <pre> * private static final int MAX_ENTRIES = 100; * * protected boolean removeEldestEntry(Map.Entry eldest) { * return size() > MAX_ENTRIES; * } * </pre> * * <p>This method typically does not modify the map in any way, * instead allowing the map to modify itself as directed by its * return value. It <i>is</i> permitted for this method to modify * the map directly, but if it does so, it <i>must</i> return * <tt>false</tt> (indicating that the map should not attempt any * further modification). The effects of returning <tt>true</tt> * after modifying the map from within this method are unspecified. * * <p>This implementation merely returns <tt>false</tt> (so that this * map acts like a normal map - the eldest element is never removed). * * @param eldest The least recently inserted entry in the map, or if * this is an access-ordered map, the least recently accessed * entry. This is the entry that will be removed it this * method returns <tt>true</tt>. If the map was empty prior * to the <tt>put</tt> or <tt>putAll</tt> invocation resulting * in this invocation, this will be the entry that was just * inserted; in other words, if the map contains a single * entry, the eldest entry is also the newest. * @return <tt>true</tt> if the eldest entry should be removed * from the map; <tt>false</tt> if it should be retained. */ protected boolean removeEldestEntry(Map.Entry<K,V> eldest) { return false; }
removeEldestEntry方法默認返回false,即默認不移除。因此我們只要在這裏加以判斷:如果緩存已經佔滿,返回true,就可以將最近最少使用的對象移除了。因此,通過使用LinkedHashMap,僅需要非常簡單的修改即可實現LRU算法。
下面附上LRU的實現代碼:
import java.util.LinkedHashMap; import java.util.Collection; import java.util.Map; import java.util.ArrayList; /** * An LRU cache, based on <code>LinkedHashMap</code>. * * <p> * This cache has a fixed maximum number of elements (<code>cacheSize</code>). * If the cache is full and another entry is added, the LRU (least recently * used) entry is dropped. * * <p> * This class is thread-safe. All methods of this class are synchronized. * * <p> * Author: Christian d'Heureuse, Inventec Informatik AG, Zurich, Switzerland<br> * Multi-licensed: EPL / LGPL / GPL / AL / BSD. */ public class LRUCache<K, V> { private static final float hashTableLoadFactor = 0.75f; private LinkedHashMap<K, V> map; private int cacheSize; /** * Creates a new LRU cache. 在該方法中,new LinkedHashMap<K,V>(hashTableCapacity, * hashTableLoadFactor, true)中,true代表使用訪問順序 * * @param cacheSize * the maximum number of entries that will be kept in this cache. */ public LRUCache(int cacheSize) { this.cacheSize = cacheSize; int hashTableCapacity = (int) Math .ceil(cacheSize / hashTableLoadFactor) + 1; map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor, true) { // (an anonymous inner class) private static final long serialVersionUID = 1; @Override protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > LRUCache.this.cacheSize; } }; } /** * Retrieves an entry from the cache.<br> * The retrieved entry becomes the MRU (most recently used) entry. * * @param key * the key whose associated value is to be returned. * @return the value associated to this key, or null if no value with this * key exists in the cache. */ public synchronized V get(K key) { return map.get(key); } /** * Adds an entry to this cache. The new entry becomes the MRU (most recently * used) entry. If an entry with the specified key already exists in the * cache, it is replaced by the new entry. If the cache is full, the LRU * (least recently used) entry is removed from the cache. * * @param key * the key with which the specified value is to be associated. * @param value * a value to be associated with the specified key. */ public synchronized void put(K key, V value) { map.put(key, value); } /** * Clears the cache. */ public synchronized void clear() { map.clear(); } /** * Returns the number of used entries in the cache. * * @return the number of entries currently in the cache. */ public synchronized int usedEntries() { return map.size(); } /** * Returns a <code>Collection</code> that contains a copy of all cache * entries. * * @return a <code>Collection</code> with a copy of the cache content. */ public synchronized Collection<Map.Entry<K, V>> getAll() { return new ArrayList<Map.Entry<K, V>>(map.entrySet()); } }
最後,緩存在使用過程中,需要考慮一致性問題。緩存的刷新就是爲了保持一致性。具體如何去刷新,需要根據具體的使用場景進行設計。