消息瘋狂堆積!RocketMQ出Bug了?

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用過 MQ 的同學,可能會遇到過"},{"type":"text","marks":[{"type":"strong"}],"text":"消息堆積"},{"type":"text","text":"的問題。而肥壕最近也踩上了這個坑,但是發現結果竟然是這麼一個意料之外的原因而導致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"正文"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那一晚月黑風高,肥壕正準備踏上回家的路,突然收到告警短信轟炸!"},{"type":"text","marks":[{"type":"strong"}],"text":"“MQ 消息堆積告警 [TOPIC: XXX] ”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"肥壕心裏“萬隻草泥馬崩騰~” 第一反應是:“怎麼肥事?剛下班就來搞事情???” "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/da/daa01212d96a1eea6109dc3ae9beb7a0.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"於是乎趕回公司趕緊打開電腦,登上 RocketMQ 後臺查看(公司自己搭建的開源版RocketMQ)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e6bd4046822a50b8851944ebbc5dc781.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"握草 (キ`゚Д゚´)!!! 竟然堆積了3億多條消息了???"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要知道出現消息堆積無在乎這個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"生產者的生產速度 >> 消費者的處理速度"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 生產者的生產速度驟增,比如生產者的流量突然驟增"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 消費速度變慢,比如消費者實例 IO 阻塞嚴重或者宕機"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"擦了一下頭上的冷汗😓...趕緊登上消費者服務器瞧瞧。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用運行正常!服務器磁盤IO 正常!網絡正常!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再去上去生產者的服務器,咦...流量也很正常!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼???佛了😨 ...生產者和消費者的應用都很正常,但是爲什麼消息會堆積怎麼多呢?看着這堆積的數量越堆越多(要是這是我頭髮的數量那該多好啊),越發着急。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然說 RocketMQ 版能支持 10 億級別的消息堆積,不會因爲消息堆積導致性能明顯下降,😰但是這堆積量很明顯就是一個異常情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RocketMQ 有 BUG,沒錯這肯定是 RocketMQ 的鍋!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本篇完..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dc/dc26ad921f8318ff6b3644dcba1ae493.jpeg","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"哈哈言歸正傳,雖然肥壕拼爹不行,但至少不能坑爹😂 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進入消費者的工程查看一下日誌,emmm...沒有發現報錯,沒有錯誤日誌...看起來好像一切都很正常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"咦...不過這個消費的速度是不是有點慢???這不科學啊,消費者可是配置了3個結點的消費集羣啊,按業務的需求量來說消費能力可是槓槓的呀。我再點開這個 TOPIC 的消費者信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9b/9b53092ac7be4904a3dded3e8efecf0e.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"咦,這三個消費者的 ClientId 怎麼會是一樣呀?"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以多年採坑經驗的直接告訴我 “難道是因爲 ClientId 的相同的問題,導致 broker 在分發消息的時候出現混亂,從而導致消息不能正常推送給消費者?” 因爲生產者和消費者都表現正常,所以我猜測問題可能在於 Broker 這一塊上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這個推測,那麼我們就需要解決這幾個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"部署在不同的服務器上的兩個消費者,爲什麼 ClientId 是相同的呢?"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"ClientId 相同,會導致 broker 消息分發錯誤嗎?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"問題分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼 ClientId 相同呢?我推測是因爲 Docker 容器的問題。因爲公司最近開始容器化階段,而剛好消費者的項目也在第一批容器化階段的列表上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了解過 Docker 的小夥伴都知道,當 Docker 進程啓動時,會在主機上創建一個名爲"},{"type":"codeinline","content":[{"type":"text","text":"docker0"}]},{"type":"text","text":"的虛擬網橋。宿主機上的 Docker 容器會連接到這個虛擬網橋上。虛擬網橋的工作方式和物理交換機類似,這樣主機上的所有容器就通過交換機連在了一個二層網絡中。而 Docker 的網絡模式一般有四種:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Host 模式"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Container 模式"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"None 模式"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Bridge 模式"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對這幾個模式不清楚的同學自行找度娘🤔 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"我們容器都是採用 Host 模式,所以容器的網絡跟宿主機是完全一致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3c/3c476a3c2636f0d149941e3b93f72d2b.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,這裏第一個就是"},{"type":"codeinline","content":[{"type":"text","text":"docker0"}]},{"type":"text","text":"網卡,"},{"type":"text","marks":[{"type":"strong"}],"text":"默認的 ip 都是172.17.0.1"},{"type":"text","text":"。所以顯而易見,ClientId 應該讀取的都是"},{"type":"codeinline","content":[{"type":"text","text":"docker0"}]},{"type":"text","text":"網卡的 IP,這就是能解釋爲什麼多個消費端的 ClientId 都一致的問題了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼接下來就是 clientId 的究竟是在哪裏設置呢?機智的我在 Github 的 Issues 搜索關鍵詞 “Docker”,啪啦啪啦一搜,果然!還是有不少踩過次坑的志同道合之士,篩選了一番,找到一個比較靠譜的 "},{"type":"link","attrs":{"href":"https://github.com/apache/rocketmq/issues/667","title":""},"content":[{"type":"text","text":"open issue"}]},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/82/829c87b68bc3f9f6ef15b2a12ba12bfe.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到,這個兄弟跟我的遇到的情況是一毛一樣的,而他的結論跟我上面的推測也是大致相同(此時內心洋洋得意一番),他這裏還提到 clientId 是在 ClientConfig 類中 "},{"type":"codeinline","content":[{"type":"text","text":"buildMQClientId"}]},{"type":"text","text":" 方法中定義的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"源碼探索"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進入 ClientConfig 類,定位到 buildMQClientId 方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public String buildMQClientId() {\n StringBuilder sb = new StringBuilder();\n sb.append(this.getClientIP());\n\n sb.append(\"@\");\n sb.append(this.getInstanceName());\n if (!UtilAll.isBlank(this.unitName)) {\n sb.append(\"@\");\n sb.append(this.unitName);\n }\n\n return sb.toString();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過這個相信大家都可以看出 clientId 的生成規則吧,就是 "},{"type":"codeinline","content":[{"type":"text","text":"消費者客戶端的IP + \"@\"+ 實例名稱 "}]},{"type":"text","text":",很明顯問題就出在獲取客戶端 IP 上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們再繼續看一下它究竟是如何獲取客戶端 IP 的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public class ClientConfig {\n ... \n private String clientIP = RemotingUtil.getLocalAddress();\n ...\n}\n\npublic static String getLocalAddress() {\n try {\n // Traversal Network interface to get the first non-loopback and non-private address\n Enumeration enumeration = NetworkInterface.getNetworkInterfaces();\n ArrayList ipv4Result = new ArrayList();\n ArrayList ipv6Result = new ArrayList();\n while (enumeration.hasMoreElements()) {\n final NetworkInterface networkInterface = enumeration.nextElement();\n final Enumeration en = networkInterface.getInetAddresses();\n while (en.hasMoreElements()) {\n final InetAddress address = en.nextElement();\n if (!address.isLoopbackAddress()) {\n if (address instanceof Inet6Address) {\n ipv6Result.add(normalizeHostAddress(address));\n } else {\n ipv4Result.add(normalizeHostAddress(address));\n }\n }\n }\n }\n\n // prefer ipv4\n if (!ipv4Result.isEmpty()) {\n for (String ip : ipv4Result) {\n if (ip.startsWith(\"127.0\") || ip.startsWith(\"192.168\")) {\n continue;\n }\n\n return ip;\n }\n\n return ipv4Result.get(ipv4Result.size() - 1);\n } else if (!ipv6Result.isEmpty()) {\n return ipv6Result.get(0);\n }\n //If failed to find,fall back to localhost\n final InetAddress localHost = InetAddress.getLocalHost();\n return normalizeHostAddress(localHost);\n } catch (Exception e) {\n log.error(\"Failed to obtain local address\", e);\n }\n\n return null;\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果有操作過獲取當前機器的 IP 的小夥伴,應該對"},{"type":"codeinline","content":[{"type":"text","text":"RemotingUtil.getLocalAddress()"}]},{"type":"text","text":" 這個工具方法並不陌生~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單說就是獲取當前機器網卡 IP,但是由於容器的網絡模式採用的是 host 模式,也就意味着各個容器和宿主機都是處於同一個網絡下,所以容器中我們也可以看到 Docker - Server 所創建的"},{"type":"codeinline","content":[{"type":"text","text":"docker 0"}]},{"type":"text","text":"網卡,所以它讀取的也就是 "},{"type":"codeinline","content":[{"type":"text","text":"docker 0"}]},{"type":"text","text":"網卡所默認的 IP 地址 172.17.0.1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(跟運維同學溝通了一下,目前由於是容器化的第一階段,所以先採用簡單模式部署,後面會慢慢替換成 k8s,每個 pod 都有自己的獨立 IP ,到時網絡會與宿主機和其他 pod 的相互隔離。emmm....k8s !聽起來牛逼哄哄,恰好最近也在看這方面的書)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"這時候聰明的你可能會問 “不是還有一個實例名稱的參數呢,這個又怎麼會相同呢?” "},{"type":"text","text":" 彆着急,我們繼續往下看👇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"private String instanceName = System.getProperty(\"rocketmq.client.name\", \"DEFAULT\");\n\npublic String getInstanceName() {\n return instanceName;\n}\n\npublic void setInstanceName(String instanceName) {\n this.instanceName = instanceName;\n}\n\npublic void changeInstanceNameToPID() {\n if (this.instanceName.equals(\"DEFAULT\")) {\n this.instanceName = String.valueOf(UtilAll.getPid());\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"getInstanceName()"}]},{"type":"text","text":" 方法其實直接獲取 "},{"type":"codeinline","content":[{"type":"text","text":"instanceName"}]},{"type":"text","text":"這個參數值,但是這個參數值是什麼時候賦值進去的呢?沒錯就是通過"},{"type":"codeinline","content":[{"type":"text","text":"changeInstanceNameToPID()"}]},{"type":"text","text":"這個方法賦值的,在 consumer 在 start 的時候會調用此方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個參數的邏輯很簡單,在初始化的時候首先會獲取環境變量"},{"type":"codeinline","content":[{"type":"text","text":"rocketmq.client.name"}]},{"type":"text","text":"是否有值,如果沒有就是用默認值"},{"type":"codeinline","content":[{"type":"text","text":"DEFAULT"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後 consumer 啓動的時候會判斷這參數值是否爲"},{"type":"codeinline","content":[{"type":"text","text":"DEFAULT"}]},{"type":"text","text":",如果是的話就調用 "},{"type":"codeinline","content":[{"type":"text","text":"UtilAll.getPid()"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public static int getPid() {\n RuntimeMXBean runtime = ManagementFactory.getRuntimeMXBean();\n String name = runtime.getName(); // format: \"pid@hostname\"\n try {\n return Integer.parseInt(name.substring(0, name.indexOf('@')));\n } catch (Exception e) {\n return -1;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過方法名字我們就可以很清楚知道,這個方法其實獲取進程號的。那...爲什麼獲取的進程號都是一致的呢?"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e3/e356c3e941e22c83cec5c1d4d1bca566.gif","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聰明的你可以已經知道答案了對吧🤨 !這裏就不得不提 "},{"type":"text","marks":[{"type":"strong"}],"text":"Docker 的 三大特性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"cgroup"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"namespace"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"unionFS"},{"type":"text","text":" "}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒錯,這裏用的就是 namespace 技術啦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">Linux Namespace 是 Linux 內核提供的一個功能,可以實現系統資源的隔離,如:PID、User I\bD、Network 等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於都是使用相同的基礎鏡像,在最外層都是運行同樣的 JAVA 工程,所以我們可以進去容器裏面看,他們的進程號都是爲 9"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過肥壕的一系列巧妙的推理和論證,"},{"type":"text","marks":[{"type":"strong"}],"text":"在 Docker 容器 HOST 網絡模式下, 會生成相同的 clientId !"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到這裏爲止,我們算是解決了上文推測的第一個問題!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"緊跟柯南肥壕的步伐,我們繼續推理第二個問題: "},{"type":"text","marks":[{"type":"strong"}],"text":"clientId 相同導致 Broker 分發消息錯誤?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer 在負載均衡的時候應該是根據 clientId 作爲客戶端消費者的唯一標識,在消息下發的時候由於 clientId 的一致,導致負載分發錯誤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼我們下面就要去探究一下 Consumer 的負載均衡究竟是如何實現的。一開始我以爲消費端的負載均衡都是在 Broker 處理的,由Broker 根據註冊的 Consumer 把不同的 Queue 分配給不同的 Consumer。但是去看了一下源碼上的 doc 描述文檔和對源碼進行一番的研究後,結果發現自己見識還是太少了(哈哈哈,應該有小夥伴跟我開始的想法是一樣的吧)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先來補充一下 "},{"type":"text","marks":[{"type":"strong"}],"text":"RocketMQ 的整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/76b6d12769235116b2d857dd4a2be9d6.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於篇幅問題,這裏我只講解一下 Broker 和 consumer 之間的關係,其他的角色如果有不懂的可以看一下我之前寫的 RocketMQ 介紹篇的文章"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer 與 NameServer 集羣中的其中一個節點(隨機選擇)"},{"type":"text","marks":[{"type":"strong"}],"text":"建立長連接"},{"type":"text","text":",定期從 NameServer 獲取 Topic 路由信息。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"根據獲取 Topic 路由信息 與 Broker 建立長連接,且"},{"type":"text","marks":[{"type":"strong"}],"text":"定時向 Broker 發送心跳"},{"type":"text","text":"。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c676e453cc9dd99b54dd14602ef2a9be.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Broker 接收心跳消息的時候,會把 Consumer 的信息保存到本地緩存變量 "},{"type":"text","marks":[{"type":"strong"}],"text":"consumerTable"},{"type":"text","text":"。上圖大致講解了一下 consumerTable 的存儲結構和內容,最主要的是它緩存了每個 consumer 的 clientId。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於 Consumer 的消費模式,我直接引用源碼的解釋"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 RocketMQ 中,Consumer 端的兩種消費模式(Push/Pull)都是基於拉模式來獲取消息的,而在 Push 模式只是對 Pull 模式的一種封裝,其本質實現爲消息拉取線程在從服務器拉取到一批消息後,然後提交到消息消費線程池後,又“馬不停蹄”的繼續向服務器再次嘗試拉取消息。如果未拉取到消息,則延遲一下又繼續拉取。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":">"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在兩種基於拉模式的消費方式(Push/Pull)中,均需要 Consumer 端在知道從 Broker 端的哪一個消息隊列—隊列中去獲取消息。因此,有必要在 Consumer 端來做負載均衡,即 Broker 端中多個 MessageQueue 分配給同一個ConsumerGroup 中的哪些 Consumer 消費。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"所以簡單來說,不管是 Push 還是 Pull 模式,消息消費的控制權在 Consumer 上,所以 Consumer 的負載均衡實現是在 Consumer 的 Client 端上"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過查看源碼可以發現, RebalanceService 會完成負載均衡服務線程(每隔20s執行一次),RebalanceService 線程的run() 方法最終調用的是 RebalanceImpl 類的 "},{"type":"codeinline","content":[{"type":"text","text":"rebalanceByTopic()"}]},{"type":"text","text":"方法,該方法是實現 Consumer 端負載均衡的核心。這裏,"},{"type":"codeinline","content":[{"type":"text","text":"rebalanceByTopic()"}]},{"type":"text","text":"方法會根據消費者通信類型爲“廣播模式”還是“集羣模式”做不同的邏輯處理。這裏主要來看下集羣模式下的主要處理流程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"private void rebalanceByTopic(final String topic, final boolean isOrder) {\n switch (messageModel) {\n case BROADCASTING: {\n ..... // 省略\n }\n case CLUSTERING: {\n // 獲取該Topic主題下的消息消費隊列集合\n Set mqSet = this.topicSubscribeInfoTable.get(topic);\n // 向 broker 獲取消費者的clientId\n List cidAll = this.mQClientFactory.findConsumerIdList(topic, consumerGroup);\n if (null == mqSet) {\n if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {\n log.warn(\"doRebalance, {}, but the topic[{}] not exist.\", consumerGroup, topic);\n }\n }\n\n if (null == cidAll) {\n log.warn(\"doRebalance, {} {}, get consumer id list failed\", consumerGroup, topic);\n }\n\n if (mqSet != null && cidAll != null) {\n List mqAll = new ArrayList();\n mqAll.addAll(mqSet);\n\n Collections.sort(mqAll);\n Collections.sort(cidAll);\n // 默認平均分配算法\n AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;\n\n List allocateResult = null;\n try {\n allocateResult = strategy.allocate(\n this.consumerGroup,\n this.mQClientFactory.getClientId(),\n mqAll,\n cidAll);\n } catch (Throwable e) {\n log.error(\"AllocateMessageQueueStrategy.allocate Exception. allocateMessageQueueStrategyName={}\", strategy.getName(),\n e);\n return;\n }\n\n Set allocateResultSet = new HashSet();\n if (allocateResult != null) {\n allocateResultSet.addAll(allocateResult);\n }\n\n boolean changed = this.updateProcessQueueTableInRebalance(topic, allocateResultSet, isOrder);\n if (changed) {\n log.info(\n \"rebalanced result changed. allocateMessageQueueStrategyName={}, group={}, topic={}, clientId={}, mqAllSize={}, cidAllSize={}, rebalanceResultSize={}, rebalanceResultSet={}\",\n strategy.getName(), consumerGroup, topic, this.mQClientFactory.getClientId(), mqSet.size(), cidAll.size(),\n allocateResultSet.size(), allocateResultSet);\n this.messageQueueChanged(topic, mqSet, allocateResultSet);\n }\n }\n break;\n }\n default:\n break;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 從本地緩存變量 "},{"type":"codeinline","content":[{"type":"text","text":"topicSubscribeInfoTable"}]},{"type":"text","text":" 中,獲取該Topic主題下的消息消費隊列集合(mqSet);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) 根據 topic 和 consumerGroup 爲參數調用"},{"type":"codeinline","content":[{"type":"text","text":"findConsumerIdList()"}]},{"type":"text","text":"方法向 Broker 端發送獲取該消費組下 "},{"type":"text","marks":[{"type":"strong"}],"text":"clientId 列表"},{"type":"text","text":";"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3) 先對 Topic 下的消息消費隊列、消費者Id排序,然後用"},{"type":"text","marks":[{"type":"strong"}],"text":"消息隊列分配策略算法"},{"type":"text","text":"(默認爲:消息隊列的平均分配算法),計算出待拉取的消息隊列。這裏的平均分配算法,類似於分頁的算法,將所有 MessageQueue 排好序類似於記錄,將所有消費端 Consumer 排好序類似頁數,並求出每一頁需要包含的平均 size 和每個頁面記錄的範圍 range,最後遍歷整個range 而計算出當前 Consumer 端應該分配到的記錄(這裏即爲:MessageQueue)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c8/c826b28746245dce135afa7aa677ccad.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(4) 然後,調用updateProcessQueueTableInRebalance()方法,具體的做法是,先將分配到的消息隊列集合(mqSet)與processQueueTable做一個過濾比對。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c114040f6c687a6ab12f2c99b0f53b98.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中 processQueueTable 標註的紅色部分,表示與分配到的消息隊列集合 mqSet 互不包含。將這些隊列設置Dropped 屬性爲 true,然後查看這些隊列是否可以移除出 processQueueTable 緩存變量,這裏具體執行"},{"type":"codeinline","content":[{"type":"text","text":"removeUnnecessaryMessageQueue()"}]},{"type":"text","text":"方法,即每隔1s 查看是否可以獲取當前消費處理隊列的鎖,拿到的話返回true。如果等待1s後,仍然拿不到當前消費處理隊列的鎖則返回false。如果返回true,則從 processQueueTable 緩存變量中移除對應的 Entry;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中 processQueueTable 的綠色部分,表示與分配到的消息隊列集合 mqSet 的交集。判斷該 ProcessQueue 是否已經過期了,在Pull模式的不用管,如果是 Push 模式的,設置 Dropped 屬性爲 true,並且調用"},{"type":"codeinline","content":[{"type":"text","text":"removeUnnecessaryMessageQueue()"}]},{"type":"text","text":"方法,像上面一樣嘗試移除 Entry;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息消費隊列在同一消費組不同消費者之間的負載均衡,"},{"type":"text","marks":[{"type":"strong"}],"text":"其核心設計理念是在一個消息消費隊列在同一時間只允許被同一消費組內的一個消費者消費,一個消息消費者能同時消費多個消息隊列"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這部分內容是摘自RocketMQ 源碼中 docs的文檔,不知道你們看懂了沒,反正我是看了好幾遍才理解了🤔🤔🤔"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實看步驟3的圖,負載均衡的實現原來也就一目瞭然了,"},{"type":"text","marks":[{"type":"strong"}],"text":"簡單說就是給不同的消費者分配數量相同的消費隊列"},{"type":"text","text":"。而消費者都會生成 clientId 的唯一標識,但是根據我們上文的推理,在容器中並且是Host網絡模式下會生成一致的 clientId。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Emmmm....到這裏,想必大家都能猜到究竟是哪裏出問題了吧。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒錯!問題應該就出在步驟3中,平均分配的計算方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"@Override\npublic List allocate(String consumerGroup, String currentCID, List mqAll, List cidAll) {\n if (currentCID == null || currentCID.length() < 1) {\n throw new IllegalArgumentException(\"currentCID is empty\");\n }\n if (mqAll == null || mqAll.isEmpty()) {\n throw new IllegalArgumentException(\"mqAll is null or mqAll empty\");\n }\n if (cidAll == null || cidAll.isEmpty()) {\n throw new IllegalArgumentException(\"cidAll is null or cidAll empty\");\n }\n\n List result = new ArrayList();\n if (!cidAll.contains(currentCID)) {\n log.info(\"[BUG] ConsumerGroup: {} The consumerId: {} not in cidAll: {}\",\n consumerGroup,\n currentCID,\n cidAll);\n return result;\n }\n // 當前clientId所在的下標\n int index = cidAll.indexOf(currentCID);\n int mod = mqAll.size() % cidAll.size();\n int averageSize =\n mqAll.size() <= cidAll.size() ? 1 : (mod > 0 && index < mod ? mqAll.size() / cidAll.size()\n + 1 : mqAll.size() / cidAll.size());\n int startIndex = (mod > 0 && index < mod) ? index * averageSize : index * averageSize + mod;\n int range = Math.min(averageSize, mqAll.size() - startIndex);\n for (int i = 0; i < range; i++) {\n result.add(mqAll.get((startIndex + i) % mqAll.size()));\n }\n return result;\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的計算可以看起來有點繞,但是其實看懂了之後,說白就是計算當前 Consumer 所分配的消息隊列,就好比上圖步驟3中的圖示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設當前只有一個 consumer ,那我們的消費其實是完全正常的,因爲當前 Topic 下所有的隊列都會分配給當前的 consumer ,也不存在負載均衡的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/84/84e6e0288306d86d8c47ae7a4beb94e7.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設當前有兩個 consumer,按照正常的計算方式結果應該是這樣子的。但是因爲"},{"type":"codeinline","content":[{"type":"text","text":"cidAll"}]},{"type":"text","text":"是兩個重複的 clientId,所以兩個 consumer 獲得的 index 都是0,"},{"type":"text","marks":[{"type":"strong"}],"text":"自然他們分配的都是相同的 MessageQueue"},{"type":"text","text":"。這就能解釋開頭爲什麼能看到是有消費的日誌,但是消費速度非常慢的原因了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"解決方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"解決負載均衡錯誤"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"罪魁禍首:clientId"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過一翻精彩的推論,大家應該知道導致 Consumer 負載均衡錯誤的根本原因就是Consumer 客戶端生成的 clientId 一致,所以解決這個問題重點就是在於修改 clientId 的生成規則。上面簡單地從源碼分析了一下 clientId 的生成規則 ,我們可以通過手動設置 "},{"type":"codeinline","content":[{"type":"text","text":"rocketmq.client.name"}]},{"type":"text","text":" 這個環境變量,生成自定義唯一的 clientId 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"肥壕這裏在原來的 pid 後再加上了時間戳:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"@PostConstruct\npublic void init() {\n System.setProperty(\"rocketmq.client.name\", String.valueOf(UtilAll.getPid()) + \"@\" + System.currentTimeMillis());\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"2","normalizeStart":"2"},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"解決消息堆積"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"終於解決了根本問題了!行吧,萬事俱備只差上線,隊列裏頭堆積的3億多條消息還在等着消費呢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(可謂是一時堆積一時爽,一直堆積一直爽😭)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛上線了不久,emmm...效果顯著,堆積的消息數量逐漸減少了。但是另外一個告警來了,mongodb 告警了!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"握草。。。我差點忘記了,消費者對消息業務處理後後會寫入mongodb,現在消費的流量入口突然驟增,mongodb反倒扛不住了。不過還好歷史的消息不重要,是可以丟失的。於是肥壕果斷去後臺重置了一下消費點位,妥了現在消費正常了,mongodb也正常了。呼~有驚無險,差點又釀造了另外一起事故。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 的 consumer 客戶端都會生成 clientId 唯一標識,clientId 的生成規則是"},{"type":"codeinline","content":[{"type":"text","text":"客戶端IP+客戶端進程號"}]}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Docker 容器部署如果網絡模式使用 Host 模式,容器中的應用都會獲取 Docker 網橋的默認IP"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 的 consumer 端負載均衡是在客戶端實現的,consumer 客戶端會緩存對應的 Topic 消費隊列,默認採用消息隊列的平均分配算法,如果 clientId 相同那麼所有的客戶端都會分配到相同的隊列,導致消費異常。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"對於消息堆積的處理,要做好全面的檢查。不能被瞬間大流量的消費入口而影響其他業務,不然就像肥壕一樣搞出另一起事故了(大家如果有更好的消息堆積處理方案歡迎留言提議)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這次是肥壕第一次寫有關線上事故的文章,可能很多地方或者細節上比較粗糙,望各位廣大猿友多體諒多提建議哈~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經歷過的線上事故其實還不少,但是每次總結都是流於形式,希望從今以後能用文章的形式整理出來,一是有助於自己日後的總結覆盤,也能給大家提供更多的採坑經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是肥壕的個人博客地址:"},{"type":"link","attrs":{"href":"http://edisonz.cn/","title":""},"content":[{"type":"text","text":"http://edisonz.cn/"}]},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後續的文章也會同步更新~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章