. 一個大坑:若實例化 JedisShardInfo 時不設置節點名稱(name屬性),那麼當Redis節點列表的順序發生變化時,會發生“鍵 rehash 現象”
使用BTrace追蹤redis.clients.util.Sharded的實時狀態,驗證“Jedis分片機制的一致性哈希算法”實現;
發現一個致命坑:若JedisShardInfo不設置節點名稱(name屬性),那麼當Redis節點列表的順序發生變化時,會發生“鍵 rehash 現象”。見Sharded的initialize(...)方法實現:
(I) this.algo.hash("SHARD-" + i + "-NODE-" + n)
【缺點】 大坑:將節點的順序索引i作爲hash的一部分! 當節點順序被無意識地調整了,會觸發”鍵 rehash 現象”,那就杯具啦!("因節點順序調整而引發rehash"的問題)
(II) this.algo.hash(shardInfo.getName() + "*" + shardInfo.getWeight() + n)
【優點】 這樣設計避免了上面"因節點順序調整而引發rehash"的問題。
【缺點】 坑:"節點名稱+權重"必須是唯一的,否則節點會出現重疊覆蓋! 同時,"節點名稱+權重"必須不能被中途改變!
(III) 節點IP:端口號+編號
Memcached Java Client,就是採用這種策略。
【缺點】 因機房遷移等原因,可能導致節點IP發生改變!
(IIII) 唯一節點名稱+編號
較好地一致性hash策略是:唯一節點名稱+編號,不要考慮權重因素!
long hash = algo.hash(shardInfo.getName() + "*" + n)
所以,在配置Redis服務列表時,必須要設置節點邏輯名稱(name屬性)。
redis.server.list=192.168.6.35:6379:Shard-01,192.168.6.36:6379:Shard-02,192.168.6.37:6379:Shard-03,192.168.6.38:6379:Shard-04
相關代碼如下所示:
- public class Sharded<R, S extends ShardInfo<R>> {
- public static final int DEFAULT_WEIGHT = 1;
- private TreeMap<Long, S> nodes;
- private final Hashing algo;
- private final Map<ShardInfo<R>, R> resources = new LinkedHashMap<ShardInfo<R>, R>();
- public Sharded(List<S> shards) {
- this(shards, Hashing.MURMUR_HASH); // MD5 is really not good as we works with 64-bits not 128
- }
- public Sharded(List<S> shards, Hashing algo) {
- this.algo = algo;
- initialize(shards);
- }
- private void initialize(List<S> shards) {
- nodes = new TreeMap<Long, S>();
- for (int i = 0; i != shards.size(); ++i) {
- final S shardInfo = shards.get(i);
- if (shardInfo.getName() == null) for (int n = 0; n < 160 * shardInfo.getWeight(); n++) {
- nodes.put(this.algo.hash("SHARD-" + i + "-NODE-" + n), shardInfo);
- }
- else for (int n = 0; n < 160 * shardInfo.getWeight(); n++) {
- nodes.put(this.algo.hash(shardInfo.getName() + "*" + shardInfo.getWeight() + n), shardInfo);
- }
- resources.put(shardInfo, shardInfo.createResource());
- }
- }
- ...
- }
3. "Redis客戶端連接數一直降不下來"的問題
這個問題發生有兩方面的原因:
- 未正確使用對象池的空閒隊列行爲(LIFO“後進先出”棧方式)
- “關閉集羣鏈接時異常導致連接泄漏”問題(見本文的第一個問題)
具體分析過程,詳見《[線上問題] "Redis客戶端連接數一直降不下來"的問題解決》。
2. Jedis “Socket讀取超時”導致“返回值類型錯誤”
異常信息如下所示:
- [2015-02-07 09:17:47] WARN c.f.f.b.s.r.i.CustomShardedJedisFactory -quit jedis connection for server fail: xxx.xxx.xxx.xxx:xxx
- java.lang.ClassCastException: java.lang.Long cannot be cast to [B (強制類型轉換異常)
- at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:181) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:136) ~[jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisFactory.destroyObject(CustomShardedJedisFactory.java:116) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:848) [commons-pool2-2.0.jar:2.0]
- at org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:626) [commons-pool2-2.0.jar:2.0]
- at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:83) [jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:121) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:337) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:319) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- ...
- [2015-02-07 09:17:47] ERROR c.f.f.b.s.r.i.RedisServiceImpl -'zadd' key fail, key: xxx, score: xxx, member: xxx
- [2015-02-07 09:17:47] ERROR c.f.f.b.s.r.i.RedisServiceImpl -java.net.SocketTimeoutException: Read timed out
- redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out (Socket讀取超時異常)
- at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:201) ~[jedis-2.6.2.jar:na] ('limit = in.read(buf);' at java.io.InputStream.read(InputStream.java:100) - 這裏出現阻塞導致"Socket讀取超時"!)
- at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.Protocol.process(Protocol.java:128) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.Protocol.read(Protocol.java:192) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:282) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.Connection.getIntegerReply(Connection.java:207) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.Jedis.zadd(Jedis.java:1293) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.ShardedJedis.zadd(ShardedJedis.java:364) ~[jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:328) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:319) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- ...
- Caused by: java.net.SocketTimeoutException: Read timed out
- at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_51]
- at java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_51]
- at java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_51]
- at java.net.SocketInputStream.read(SocketInputStream.java:108) ~[na:1.7.0_51]
- at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:195) ~[jedis-2.6.2.jar:na]
- ... 38 common frames omitted
從異常信息來看,首先是在'zadd'操作時出現"Socket讀取超時異常",具體異常信息"JedisConnectionException: java.net.SocketTimeoutException: Read timed out"。
出現異常後,會銷燬這個阻塞的Jedis連接池對象(CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:121)),但在請求Redis服務端關閉連接時,出現"強制類型轉換異常",具體異常信息"ClassCastException: java.lang.Long cannot be cast to [B"。
這個問題已經有前輩遇到過了,其解釋:
查看 Jedis 源碼發現它的Connection中對網絡輸出流做了一個封裝(RedisInputStream),其中自建了一個buffer。當發生異常的時候,這個buffer裏還殘存着上次沒有發送或者發送不完整的命令。這個時候沒有做處理,直接將該連接返回到連接池,那麼重用該連接執行下次命令的時候,就會將上次沒有發送的命令一起發送過去,所以纔會出現上面的錯誤“返回值類型不對”。
所以,正確的寫法應該是:在發送異常的時候,銷燬這個連接,不能再重用!
參考自:
1. CustomShardedJedisFactory.destroyObject(PooledObject<ShardedJedis> pooledShardedJedis) 存在“客戶端連接泄露”問題
異常信息如下所示:
- [2015-01-28 15:33:51] ERROR c.f.f.b.s.r.i.RedisServiceImpl -ShardedJedis close fail
- redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
- at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:85) ~[jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:120) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisPool.returnBrokenResource(CustomShardedJedisPool.java:26) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at redis.clients.jedis.ShardedJedis.close(ShardedJedis.java:638) ~[jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.close(RedisServiceImpl.java:90) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:380) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.RedisServiceImpl.zadd(RedisServiceImpl.java:346) [forseti-biz-service-1.0-SNAPSHOT.jar:na]
- ...
- at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
- at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
- at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
- Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to [B
- at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:181) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.BinaryJedis.quit(BinaryJedis.java:136) ~[jedis-2.6.2.jar:na]
- at redis.clients.jedis.BinaryShardedJedis.disconnect(BinaryShardedJedis.java:35) ~[jedis-2.6.2.jar:na]
- at cn.fraudmetrix.forseti.biz.service.redis.impl.CustomShardedJedisFactory.destroyObject(CustomShardedJedisFactory.java:106) ~[forseti-biz-service-1.0-SNAPSHOT.jar:na]
- at org.apache.commons.pool2.impl.GenericObjectPool.destroy(GenericObjectPool.java:848) ~[commons-pool2-2.0.jar:2.0]
- at org.apache.commons.pool2.impl.GenericObjectPool.invalidateObject(GenericObjectPool.java:626) ~[commons-pool2-2.0.jar:2.0]
- at redis.clients.util.Pool.returnBrokenResourceObject(Pool.java:83) ~[jedis-2.6.2.jar:na]
- ... 37 common frames omitted
從異常信息來看,是由於應用程序無法捕獲運行時的強制類型轉換異常(“java.lang.ClassCastException: java.lang.Long cannot be cast to [B”)導致關閉操作異常中斷,問題的根源代碼位於“BinaryShardedJedis.disconnect(BinaryShardedJedis.java:35)
CustomShardedJedisFactory.destroyObject(CustomShardedJedisFactory.java:106)”。
原實現代碼只捕獲了 JedisConnectionException 異常,如下所示:
- public void destroyObject(PooledObject<ShardedJedis> pooledShardedJedis) throws Exception {
- final ShardedJedis shardedJedis = pooledShardedJedis.getObject();
- shardedJedis.disconnect(); // "鏈接資源"無法被釋放,存在泄露
- }
- public void disconnect() {
- for (Jedis jedis : getAllShards()) {
- try {
- jedis.quit();
- } catch (JedisConnectionException e) {
- // ignore the exception node, so that all other normal nodes can release all connections.
- }
- try {
- jedis.disconnect();
- } catch (JedisConnectionException e) {
- // ignore the exception node, so that all other normal nodes can release all connections.
- }
- }
- }
修復後代碼捕獲了所有的 Exception,就不存在釋放鏈接時由於異常未捕獲而導致鏈接釋放中斷。如下所示:
- public void destroyObject(PooledObject<ShardedJedis> pooledShardedJedis) throws Exception {
- final ShardedJedis shardedJedis = pooledShardedJedis.getObject();
- // shardedJedis.disconnect(); // "鏈接資源"無法被釋放,存在泄露
- for (Jedis jedis : shardedJedis.getAllShards()) {
- try {
- // 1. 請求服務端關閉連接
- jedis.quit();
- } catch (Exception e) {
- // ignore the exception node, so that all other normal nodes can release all connections.
- // java.lang.ClassCastException: java.lang.Long cannot be cast to [B
- // (zadd/zcard 返回 long 類型,而 quit 返回 string 類型。從這裏看,上一次的請求結果並未讀取)
- logger.warn("quit jedis connection for server fail: " + toServerString(jedis), e);
- }
- try {
- // 2. 客戶端主動關閉連接
- jedis.disconnect();
- } catch (Exception e) {
- // ignore the exception node, so that all other normal nodes can release all connections.
- logger.warn("disconnect jedis connection fail: " + toServerString(jedis), e);
- }
- }
- }
轉:http://bert82503.iteye.com/blog/2184225