云原生在京东丨云原生时代下的监控:如何基于云原生进行指标采集?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/af/af9f6637b50b09be60b00a42f3812d5e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"云妹导读:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从 IDC 到云,从弹性计算到容器技术,整个软件运行的环境发生了天翻地覆的变化,监控对象以及指标也发生了微妙的变化。从原本的主机为主体,变为了容器和服务为主体。而人们对于监控的要求,也逐渐从“看到指标 ”向被监控对象的“可观测性”发生转变。这一转变在以 Kubernetes 为代表的容器管理领域尤为明显。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从 Kubernetes 成为容器管理领域的事实标准开始,基于云原生也就是基于 Kubernetes 原生。在云的体系下,基础硬件基本上都被抽象化、模糊化,硬故障需要人为干预的频次在逐渐降低,健康检查、失败自愈、负载均衡等功能的提供,也使得简单的、毁灭性的故障变少。而随着服务的拆分和模块的堆叠,不可描述的、模糊的、莫名其妙的故障却比以前更加的频繁。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“看到指标”只是对于数据简单的呈现,在目前云的环境下,并不能高效地帮助我们找到问题。而“可观测性”体现的是对数据的再加工,旨在挖掘出数据背后隐藏的信息,不仅仅停留在展现数据层面,更是经过对数据的解析和再组织,体现出数据的上下文信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了达成“可观测性”的目标,就需要更加"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"标准化、简洁化的指标数据,以及更便捷的收集方式,更强更丰富的语义表达能力,更快更高效的存储能力。"},{"type":"text","text":"本篇文章将主要探讨时序指标的采集结构和采集方式,数据也是指时序数据,存储结构以及 tracing、log、event 等监控形式不在本次讨论范围之内。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/df/df4ab1867a6762b3bde6736a8bee6bc7.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提到时序数据,让我们先看看几个目前监控系统比较常用的时序数据库:"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"opentsdb,influxdb,prometheus "},{"type":"text","text":"等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"经典的时序数据基本结构大家都是有统一认知的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"唯一序列名标识,即指标名称;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"指标的标签集,详细描述指标的维度;"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"时间戳与数值对,详细描述指标在某个时间点的值。 "}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"时序数据基本结构为指标名称 + 多个 kv 对的标签集 + 时间戳 + 值,但是在细节上各家又各有不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/09/0965fa5182d869fd10199f45620c49cc.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7b/7bd45d70a5f9f6dadac0cf0fa957050a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f39292cc71b30f925e8f190c8fba9252.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"opentsdb 使用大家耳熟能详的 json 格式,可能是用户第一反应中结构化的时序数据结构。只要了解基本时序数据结构的人一眼就能知道各个字段的含义。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/95/95f2f16ef91c11ef73c98965a33b8407.webp","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1[,=[,= ]] =[,=] []\n2例如:\n3cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/56/560a7cf68bc8e22cfacfa9ab38c6b60f.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e1c4b7945fcbeeed787078d5b53d2f58.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1metric_name [\n2\"{\" label_name \"=\" `\"` label_value `\"` { \",\" label_name \"=\" `\"` label_value `\"` } [ \",\" ] \"}\"\n3] value [ timestamp ]\n4例如:\n5http_requests_total{method=\"post\",code=\"200\"} 1027 1395066363000"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3e/3e15ca0a07fb2bad88b40406f5199a0c.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"influxdb 和 prometheus 都使用了自定义文本格式的时序数据描述,通过固定的语法格式将 json 的树状层级结构打平,并且没有语义的丢失,行级的表述形式更便于阅读。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0c/0c0bc855985610704060b9ab62f53720.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本格式优势"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"○ 更符合人类阅读习惯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"○ 行级的表述结构对文件读取的内存优化更友好"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"○ 减少了层级的嵌套"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文本格式劣势"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"○ 解析成本更高"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"○ 校验相对更麻烦"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a2204ee98ef2cf112c47e68632fe7fbb.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用过 Prometheus 的同学可能会注意到其实 Prometheus 的采集结构不是单行的,每类指标往往还伴随着几行注释内容 ,其中主要是两类HELP 和 TYPE ,分别表示指标的简介说明和类型。格式大概是:"}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1# Anything you want to say\n2# HELP http_requests_total The total number of HTTP requests.\n3# TYPE http_requests_total counter\n4http_requests_total{method=\"post\",code=\"200\"} 1027 1395066363000\n5http_requests_total{method=\"post\",code=\"400\"} 3 1395066363000"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus 主要支持4类指标类型:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Counter:"},{"type":"text","text":"只增不减的计数器。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Gauge:"},{"type":"text","text":"可增可减的数值。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Histogram:"},{"type":"text","text":"直方图,分桶采样。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Summary:"},{"type":"text","text":"数据汇总,分位数采样。"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中 Counter 和 Gauge 很好理解,Histogram 和 Summary 可能一时间会让人迷惑。其实 Histogram 和 Summary 都是为了从不同维度解决和过滤长尾问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,我和首富的平均身价并不能真实反映出我自己的身价。因此分桶或者分位数才能更准确的描述数据真实的分布状态。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而 Histogram 和 Summary 主要区别就在于对分位数的计算上,Histogram  在客户端只进行分桶计算,因此可以在服务端进行整体的分位数计算。Summary 则是在客户端环境下计算了分位数,因此失去了在整体视图上进行分位数计算的可能性。官方也给出了 Histogram 和 Summary 的区别:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bf/bfbfa5df1a31fa654d90ff1caa08dc03.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要说明的是,截止到目前为止的 Prometheus 版本 2.20.1,这些 metric types 仅仅使用在客户端库(client libraries)和传输协议(wire protocol)中,服务端暂时没有记录这些信息。所以如果你对一个 Gauge 类型的指标使用 histogram_quantile(0.9,xxx) 也是可以的,只不过因为没有 xxx_bucket 的存在,计算不出来值而已。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/70/70a82030ce14883774372f86591936b8.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"时序监控数据的采集,从监控端来看,数据获取的形式只有两种,pull 和 push,不同的采集方式也决定了部署方式的不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"还是通过 opentsdb,prometheus 来举例,因为 influxdb 集群版本方案为商业版,暂不做讨论。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0ea1743135ff9a6fe3990bd12bdb7fab.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3c/3c2bedfe2f8ffe7216973ed56dc0cbd2.webp","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图为 opentsdb 架构图 ,其中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Servers:"},{"type":"text","text":"表示被采集数据的服务,C则是表示采集指标的工具,可以理解为 opensdb 的 agent,servers 通过C将数据推送到下游的 TSD。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"TSD:"},{"type":"text","text":"对应实际进程名 TSDMain 是 opentsdb 组件,理解为接收层,每个TSD都是独立的,没有 master 和 slave 的区分,也没有共享状态。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"HBase:"},{"type":"text","text":"opentsdb实际的最终数据存储在 hbase 中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从架构图可以看出,如果推送形式的数据量大量增长的情况下,可以通过多级组件,或者扩容无状态的接收层等方式较为简单的进行吞吐量的升级。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9d/9d02bc2d9963d198fe538b8099df9249.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bd2c9477f3f69e004cbf93e6a48a89e1.webp","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图为 prometheus 架构图,主要看下面几个部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"blockquote","content":[{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Prometheus Server:"},{"type":"text","text":"用于抓取和存储时间序列化数据。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Exporters:"},{"type":"text","text":"主动拉取数据的插件。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"Pushgateway:"},{"type":"text","text":"被动拉取数据的插件。"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拉取的方式,通常是监控端定时从配置的各个被监控端的 Exporter 处拉取指标。这样的设计方式可以降低监控端与被监控端的耦合程度,被监控端不需要知道监控端的存在,这样将指标发送的压力从被监控端转义到监控端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对比一下 pull 和 push 方式各自的优劣势:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pull 的优势"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pull 的劣势"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"简单对比了 pull 和 push 各自的特点,在云原生环境中,prometheus 是目前的时序监控标准,为什么会选择pull的形式,这里有官方的解释(https://prometheus.io/docs/introduction/faq/#why-do-you-pull-rather-than-push)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面简单介绍了一下从监控端视角看待数据采集方式的 pull 和 push 形式,而从被监控端来看,数据获取的方式就多种多样了,通常可以分为以下几种类型:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"blockquote","content":[{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"默认采集"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"探测采集"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"组件采集"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"埋点采集"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面一一举例说明。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/12/12260d25ebbc3c427863045a383281c9.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默认采集通常是通俗意义上的所有人都会需要观察的基础指标,往往与业务没有强关联,例如 cpu、memory、disk、net 等等硬件或者系统指标。通常监控系统都会有特定的 agent 来固定采集这些指标,而在云原生中非常方便的使用 "},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"node_exporter、CAdvisor、process-exporter"},{"type":"text","text":",分别进行节点机器、容器以及进程的基础监控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7c/7c546fde54072688f14dbb4709b6ff29.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"探测采集主要是指"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"从外部采集数据的方式。"},{"type":"text","text":"例如域名监控、站点监控、端口监控等都属于这一类。采集的方式对系统没有侵入,因为对网络的依赖比较强,所以通常会部署多个探测点,减少因为网络问题造成的误报,但是需要特别小心的是,一定要评估探测采集的频次,否则很容易对被探测方造成请求压力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/499499e97459bac8333140872065acbe.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常是指"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"已经有现成的采集方案,只需要简单的操作或者配置就可以进行详细的指标采集,"},{"type":"text","text":"例如 mysql 的监控,redis 的监控等。在云原生环境中,这种采集方式比较常见,得益于 prometheus 的发展壮大,常见的组件采集 exporter 层出不穷,prometheus 官方认证的各种 exporter。对于以下比较特殊或者定制化的需求,也完全可以按照 /metrics 接口标准自己完成自定义 exporter 的编写。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0c/0c238e9d98a5bcf590aaae129eb4dce5.webp","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于一个系统的关键性指标,本身的研发同学是最有发言权的,通过埋点的方式可以"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"精准的获取相关指标。"},{"type":"text","text":"在 prometheus 体系中可以非常方便的使用 github.com/prometheus/client_* 的工具包来实现埋点采集。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6e/6e7b6b2b03e07ecb77943afec1b63afd.webp","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文对监控系统的第一个阶段“采集”,从"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"“采集结构”"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"“采集方式”"},{"type":"text","text":"两方面做了简单的介绍和梳理。相比于以往,在云原生的环境中,服务颗粒度拆分的更细致,迭代效率更高,从开发到上线形成了更快节奏的反馈循环,这也要求监控系统能够更快速的反映出系统的异常,“采集结构”和“采集方式”虽然不是监控系统最核心的部分,但是简洁的采集结构和便捷的采集方式也为后续实现“可观测性”提供了基础。目前在云原生环境中,使用 prometheus 可以非常方便快捷的实现监控,虽然仍有许多工作需要做,例如集群化、持久化存储等,但是随着 Thanos 等方案的出现,prometheus 也在渐渐丰满中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"欢迎点击【"},{"type":"link","attrs":{"href":"https://developer.jdcloud.com/technical/cloud-native?utm_source=PMM_infoQ&utm_medium=Readmoreutm_campaign=ReadMoreutm_term=NA","title":""},"content":[{"type":"text","text":"京东智联云"}]},{"type":"text","text":"】了解开发者社区"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":""},{"type":"text","marks":[{"type":"strong"}],"text":"更多精彩技术实践与独家干货解析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":""},{"type":"text","marks":[{"type":"strong"}],"text":"欢迎关注【京东智联云开发者】公众号"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":""}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0e9d235592cebe258ab4197e98206044.jpeg?x-oss-process=image/resize,p_80/auto-orient,1","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":""}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章