基於Prometheus的高可用Redis多實例監控實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天分享的內容主要分爲以下三個方面:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus簡介;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis多實例監控實踐;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Grafana整合Zabbix\/Prometheus實踐。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這次的分享主要是從具體的案例出發,希望通過細粒度的講解爲大家使用Promethus提供一些啓發。如果當中有存疑的地方,也歡迎大家和我交流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、Prometheus簡介"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、架構"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/96\/96f81cf3aee03b0884d1c8af07e6d21c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Prometheus最常用的架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據這個架構圖,我來說說Prometheus核心的工作流程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先Prometheus的程序會負責定時去目標抓取一個指標的數據,每個指標的數據只需要通過exporter暴露出的一個HTTP就可以被定時抓取;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus從配置文件、文本、consul、還有比如架構圖中的k8s等目標作爲服務的動態發現,主要採用pull的方式來進行監控,即服務器可以直接通過目標pull的數據或者間接通過pushgateway獲得數據;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus在本地硬盤存儲數據,通過一定的規則清理和整理數據,然後把得到的結果放進時序數據庫裏;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus通過promsql或者其他的API動態展示數據,它目前支持很多種類型的圖表可視化展示,如Grafana、Prometheus自帶的web頁面或者其他自定義的圖形等;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中的pushgateway支持從client主動推送指標到Prometheus。其操作較爲靈活,即使不太熟悉Exporter的用法,也可以通過shell或者Python腳本採集數據到pushgateaway,然後來由Prometheus從pushgateaway抓取數據;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Alertmanager是獨立於Prometheus的一個發生報警的組件,支持Prometheus的查詢語句,擁有十分靈活的報警方式;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus在進入純數值時間序列這方面的條件比較優秀,它不僅適用於以服務器爲中心的監控,也適用於高動態面向服務架構的監控。在微服務的監控上,Prometheus在多維數據採集以及查詢支持上也具有較好的優勢;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus具有更強調可靠性的特點,所以它即使在故障的情況下也能查看系統的統計信息,從而權衡利弊,以儘可能少丟失數據的代價來保證整個系統的可用性。這也說明它並不適合要求數據準確率100%的系統。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、高可用promeHA"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來介紹一下我們在Prometheus上做的一些工作。因爲原生的Prometheus上幾乎都是單點的部署,不足以保證數據的可靠性,爲此我們通過開發服務註冊的方式來實現Prometheus的高可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0cdeab0b4c6728c7f0dad2044459e3e7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"右下角的配置文件基本上展示了HA軟件的功能:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是ETCD的地址:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"etcdEndpoints=[“127.0.0.1:2379”]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是使用的網卡:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"netCard=”enp0s8”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是vip:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"vip=”192.168.56.105”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個數值是指定的網卡使用序號:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"num=”2”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本條服務會註冊etcd的路徑:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"lock=”\/dev\/prometheus”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"啓動Prometheus的命令,可以自定義:"}]}]}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"cmds=”\/usr\/local\/prometheus\/prometheus --config.file=\/usr\/local\/prometheus\/prometheus.yml”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個程序的功能主要防止三種情況:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus進程故障:如果發生進程故障,promeHA啓動的守護進程會直接把Prometheus的進程拉起來;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"promeHA進程故障:如果其本身故障,則會自動下線故障的VIP和服務。而從節點會順時獲取鎖,成爲主節點並接管服務,同時啓動VIP;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"節點宕機:這時候從節點也可以獲取鎖,成爲主節點並接管服務。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果不使用自研HA的方式,也可以使用官方的方法,即用兩臺一模一樣的Prometheus的程序去監控相同的目標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、後端存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)Prometheus的遠端存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2f\/2fa82b398e74e53cb610d573d7eea959.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於Prometheus本身存儲結構的原因,官方不建議存儲較多的數據,默認保存15天。使用本地存儲方式就不太能滿足要求情況,這可以引入Remote storage。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus定義了和遠端存儲的讀寫接口。如果存儲系統要支持Prometheus的話,就要自己去實現圖上的IP層,將Prometheus的讀寫請求轉化爲內部的格式來處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而我們使用InfluxDB作爲remote的存儲,主要是因爲以下的原因:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfluxDB作爲一個開源數據庫,在使用上沒有特殊依賴,基本上可以做到“開箱即用”,同時它在管理方面自帶HTTP的界面,也不需要再配置插件;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它自帶數據過期的功能,這對於監控系統來說比較重要。比如我們可以直接在數據集中設置其配置,從而實現數據只保留180天;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它的查詢和SQL很像,類似一種SQL的查詢,並且可以查詢出語句來;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自帶一些權限管理,同時可以精細到表級別的權限。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中有兩個例子:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個是在Prometheus的配置文件裏,可以直接配置remote write和remote read的地址。InfluxDB的接口是\/api\/v1\/prom\/write,這樣指定數據的DB就可以直接讀取到遠端的數據;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個是支持用戶名和密碼的這樣有認證方式的配置。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們對後端存儲的要求不是那麼的高,而且也已經對Prometheus做了一次HA、因此在InfluxDB上做備份,也能滿足我們需求,也保證出現故障也可以迅速恢復數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我對監控的定義是這樣的:如果監控目標的可用性可能是4個9,那麼後端監控系統的級別一定是與它持平或者更低,但絕不會高於它。我們只要做到基本把監控目標暴露出來就好,而這種後端存儲可能是比較輔助的數據,所以不做過度設計。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2)對Prometheus的優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cf\/cff571331959d2fbeda28d8a1ecc317f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過配置queue conflg的參數,可以控制寫入InfluxDB的性能:爲了提高寫入效率,Prometheus在將採集到的的samples寫入遠程存儲之前,會先緩存在內存隊列中,然後打包發給遠端存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中最主要的是這兩個配置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_shards"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"min_shards配置Prometheus使用的分片的最小數量,是遠程寫入啓動時使用的分片的數量。如果遠程寫遲滯,Prometheus將自動增加分片的數量,這樣大多數用戶就不必調整這個參數。然而,增加最小分片可以讓Prometheus在開始計算所需分片數量時避免遲滯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_samples_per_send"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每次發送的最大采樣數量可以根據使用的後端進行調整。許多系統在不顯著增加延遲的情況下發送更多的批處理採樣而工作得非常好。如果試圖在每個請求中發送大量採樣,其他後端將會出現問題。足夠小的默認值,適用於大多數系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、Redis監控"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲目前最流行的緩存服務器,也是後端體系中比較重要的一環,幾乎所有的後端都會用到Redis來進行緩存,所以這也是Redis必須實時監控的原因。掌握了Redis的方法,再去監控其他服務也能做到觸類旁通。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、redis_exporter"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Prometheus中,負責數據彙報的程序統一叫做exporter,而不同的exporter負責不同的業務,並且它們具有統一的命名規範。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/32\/328fcaa3ac8f1261526be2fbe8049386.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏建議優先使用官方提供的exporter,如果不滿足需求再進行改造。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/86\/86d89b9b1d5b183a33fa06eacbbccd1e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中左邊是官方鏈接裏Redis在數據庫層面的列表,支持大部分常用的數據庫,我們這邊也使用社區的redis_exporter作爲自己的監控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、Redis多實例監控"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲Redis工作的方式是單進程多路複用,只在一臺物理服務器上部署一個節點並不能發揮多核CPU的性能,所以一般我們會在單機上啓動多個Redis實例來提高利用率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但這種方式會導致給每個實例部署一個exporter監控相對來說比較麻煩,所以我們選擇使用一個exporter去監控多個Redis實例的辦法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/25\/2585302d9f916f3416eed9d1aa8c9a02.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中的例子是:使用兩個redis exporter,分別是redisStandalone和redisCluster,它們按照監控redis的類型來區分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果發現單個exporter監控出現性能瓶頸,可以通過拆分監控增加exporter,以此來提高性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、多實例監控靜態方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/80\/8064908c4c61c974ae2064445ee51726.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個圖的配置就是官方給的一個多實例監控的靜態方式的一個方法,大家可以看一下。這裏把監控目標全寫在這個static_configs裏面的。比如我有四個Redis的實例,就可以全都寫上去。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過配置採集的Redis實例的靜態配置redis_exporter的地址,以及監控的Redis的實際地址, 兩種採集地址的組合就可以監控到多實例Redis。啓動的時候, 使用參數redis.addr= 來防止去啓動endpoint去抓取本地的Redis。這種添加的方式缺陷是如果有新的監控目標就需要去修改prometheus配置文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8a\/8a4376078076de07351ad99eb848d6b4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,前面使用這種靜態方式、每次啓動服務器的方式,會有很多不太方便的地方,就是因爲你每次監控的時候都會需要去重啓服務器,或者是reload,這樣不是很優雅。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外一種方式就是通過file_sd_configs的方式去監控json文件的變化,來實現Redis的監控目標。它的使用的例子就是這樣(如上圖),targets的json文件編寫Redis列表:”targers”:{“redis:\/\/redis-host-01:6379”,”redis:\/\/redis-host-02:6379”},”;labels”:{ }都寫在list列表裏面。這樣能發現json裏的監控目標,做到一個文件監控。當然這種文件也可以使用正則匹配的方式,可以寫成*.json的形式,這樣就可以把目錄下*.json的格式的監控目標通過file_sd_configs服務發現,自動添加監控, 避免重啓Prometheus服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/81\/812557e35e372cf1d05f12fd23018f4d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,這兩種其實對於大規模的使用和維護都會有一些缺點,因爲你不管是給這種靜態文件還是動態文件都需要去改文件,還是比較繁瑣。下面介紹下使用consul的實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"consul是基於GO語言開發的開源工具,主要面向分佈式、服務化的系統提供服務註冊、服務發現和配置管理的功能。它提供了服務註冊\/發現、健康檢查、Key\/Value存儲、多數據中心和分佈式一致性保證等功能。前面我們也說過通過Prometheus實現監控,當新增一個Target時,需要變更服務器上的配置文件,這樣會給運維人員帶來很大的負擔。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相對於使用這種靜態配置,如果使用consul的服務發現的方式,我們通過Prometheus就可以主動地去感知到系統增加或者刪除以及更新服務,然後自動地把目標加入到監控目標中,使得Prometheus相對於其他的傳統監控解決方案更適用於經常變化的監控需求,包括對接外圍的一些自動化系統的話,使用這種方式也是比較簡單的,比如說你去維護一個文件的話,去寫這種API都會比較麻煩,如果你只是去對接一個consul的API的話是非常方便的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我介紹使用consul方式動態監控的流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e0\/e03f9fd605116a91ea54b22bfe412f65.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先就是通過自定義程序與數據庫自動化平臺進行交互自動抓取監控targets,然後就是通過在consul註冊服務或註銷服務(PUT請求監控targets數據給consul),然後Prometheus會一直監控(watch)consul服務,當發現consul中符合要求的服務有新變化就會更新Prometheus的監控對象。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖中我們的流程有一個採集程序,會定時地採集redis元數據,然後把數據同步給consul,就是複製給consul,然後Prometheus會不停地watch consul中的Redis的服務內容,然後去更新自己的監控對象,這樣就能做到自動化地發現服務目標,同時也可以和我們的數據庫運維平臺去關聯起來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然如果你的環境中沒有這樣的運維平臺,這個採集程序中的抓取也可以直接去抓取Redis的master,或者redis cluster的集羣節點去發現集羣的其他節點,也是可以的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7b\/7b60bbddf87ec63cac0e36b6d013ed58.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖是往consul去註冊一個監控目標的方法,大家可以看一下。我們去Put的json數據,包含ID、name、address、tags、以及checks數據。右邊是consul的Web頁面。當我們添加如紅色框裏的數據後,在consul中是按照Services來分組的,這裏的Services對應的就是我們添加數據中的name,然後點擊其中的一個Services分組後,就可以看到這個分組下所有的監控服務了。這裏我們可以看到是對應前面我們提交的數據,對應ID,Services name對應name,Tags對應我們剛剛傳入的一些標籤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9f\/9fed7980edda67a86d2b3bacd0d67b9e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這張圖就是Prometheus中關於consul的相關配置的部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先是這個scrape_configs,下面配置的是數據源,像我們這裏的配置的一個job_name叫做redis_exporter_targets,每5秒抓取一次,然後下面是consul_sd_configs,是consul服務發現配置的地方。這裏特別重要的是一個relabel_configs,重新打標功能,這個配置非常重要,它的作用就是把數據中的一些標籤做一些替換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5e\/5e298ec0abfcbd176136c2b811afb760.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們做完前面的配置後可以看到,在consul中獲取的元數據,就是上圖黑色的部分框裏面的內容,比如說這裏我們使用的是_meta_consul_tags,用表達式,(.*),(.*),正則 ,{IP:PORT},{mastername},的形式,這個其實就是我們需要監控目標的一個mastername,通過正則表達式我們就可以提取出最終的address出來,作爲剛剛提到的redis_exporter_targets的一個信息。下面,可以將傳入的一個sentinel的一個mastername作爲第二參數來替換掉,然後在後面我們去畫監控圖的時候就可以直接通過這個標籤把相同mastername作爲一個分組。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6c\/6c387b72eec415b62b721f53cc4651b6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們完成了前面的配置,就可以在Prometheus的一個Web頁面上簡單地去看一下我們監控的一個目標,如果這裏的狀態是UP,就說明我們的配置是正確的,圖上的這些數據都已經採集到Prometheus中了。大家可以看一下,這是幾個例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/90\/903492a89d29341574b186138190ebdd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們可以在Grafana上去配置數據源,然後加載模板就可以看到如上的監控圖然後我們可以通過這個mastername去做一個對實例的區分,來選擇對應的監控圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1b\/1b366126300da7287f269f8a822e2734.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖,是Redis中比較關鍵的指標,比如說它們緩存命中率,還有過期以及驅逐key的數量,以及網絡的開銷等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/62\/622351da6d7c4469916a823ed46054b4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛剛這種方式不僅可以監控sentinel也可以去監控redis cluster,跟前面很相似,通過redis cluster的名稱進行選擇。這裏要說明一下,這個cluster name只是我們自己對集羣的定義,是放在數據庫的元數據平臺裏的,通過這種方式能比較方便地去管理redis cluster,可以在一個屏幕上就看到這個集羣下所有實例的信息非常實用。避免以前用Zabbix的時候每個節點都去登陸一下去看監控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面的例子將介紹這個Grafana整合Zabbix和Prometheus數據監控圖的方式。我們生產環境Zabbix和Prometheus是都在用的,有時候查找一個問題會登錄兩個系統,比較費時,然後切換看圖也不是非常直觀,所以我們用Grafana做了一個事情,把Zabbix和Prometheus整合到一起去。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f6\/f634e56bdc72424e25d603aeeb4f9ca1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這個圖是我們在促銷活動中使用的數據庫大屏,它主要是給開發人員使用,是從應用的視角來看數據庫的性能。因爲開發人員和我們後端的運維人員的視角會不太一樣,所以我們就根據應用的維度來做了這個監控圖。一般在活動開始前, 我們會把這個活動相關應用整理出, 列出相關的Redis和數據庫,這樣可以清楚看到應用數據庫的QPS,連接數指標,在一個頁面中看這些指標,研發同學就可以快速定位到具體的問題出現在哪個環節上。我們做完這個後,從開發人員的反饋來看,效果還是比較顯著,他們定位問題加快了很多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/73572880096e21a57a7c215c95c37393.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Grafana上如果要展示Zabbix的監控數據,需要把Zabbix的插件給整合進去,可以使用grafana-cli plugins list-remote所有可以裝的插件,然後安裝好相關的zabbix插件,裝完之後還需要進一步的加載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/51\/515602629a8c5c6953b7895adaba084a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們添加完插件後就可以去添加數據源,上圖左邊是我們通過Zabbix去添加數據源的一個方法,主要配置的是ZabbixURL以及API的地址,右邊就是我們添加Prometheus數據源的一個監控方式,直接去填寫Prometheus的地址就可以了。Zabbix有兩種方式,一種是通過API的方式去添加數據源,還有一種方式是直接去連接它的DB,我比較推薦的是通過Zabbix連接API的方式而不是直接去連接數據庫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4d\/4d04d500b9523960480b1002af58ee16.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個就是我們前面的大屏中的Zabbix和Prometheus結合的一個例子,比如的一個應用,它在應用上有一個Redis和MySQL,Redis的數據在Zabbix上,MySQL的數據是在prometheus上的,這樣的話我們可以把它們集成到一個圖上去,這樣就可以把它們的數據整合起來,我們同時可以看到Zabbix和Prometheus上的監控數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4a\/4ac4b1acc949df4049333e2ee4257c50.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖的兩個圖,大家可以清晰地看到怎麼使用Grafana去對應到Zabbix上的監控項,大家可以看一下,比如說我們這裏填的groups其實就是對應的Zabbix上的Hosts groups,這裏的應用也是對應到Zabbix的Application,這裏的host也可以對應zabbix的host,最後的一個監控目標也是可以直接對應到zabbix的item,配置完成後的監控圖,就跟在Zabbix上看到的幾乎是一樣的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/93\/938b681f07d51f780aa409e9c502eb2c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後這個Prometheus添加這個監控圖就比較簡單了,通過表達式去畫一個Prometheus在Grafana上的監控圖。我需要注意的就是這裏的變量,就是Grafana的變量要和模板上的變量去做一個匹配,非常方便。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次分享主要是挑選了我在實踐中比較有意義的或者是比較有難點的一個章節,就是希望能夠給大家打開思路,做到舉一反三。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"劉宇,甜橙金融基礎技術架構師,具有豐富的數據庫運維和研發經驗,主導並順利完成了甜橙金融上百套MySQL、Redis上雲,以及MySQL、Redis的整體架構設計和搭建,在大型活動優化上具有豐富經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/-tQ9y_OdlBkq1I5FGN4DiA","title":"xxx","type":null},"content":[{"type":"text","text":"基於Prometheus的高可用Redis多實例監控實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章