AWS发布运维仪表盘的最佳实践指南

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近,"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/aws\/","title":null,"type":null},"content":[{"type":"text","text":"AWS"}],"marks":[{"type":"underline"}]},{"type":"text","text":"在"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/","title":null,"type":null},"content":[{"type":"text","text":"Amazon"}],"marks":[{"type":"underline"}]},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/","title":null,"type":null},"content":[{"type":"text","text":"构建者库"}],"marks":[{"type":"underline"}]},{"type":"text","text":"(Amazon Builders' Library)中添加了构建仪表盘的"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/building-dashboards-for-operational-visibility\/","title":null,"type":null},"content":[{"type":"text","text":"最佳实践指南"}],"marks":[{"type":"underline"}]},{"type":"text","text":"。仪表盘用于实现运维的可见性。文档中详细阐明了Amazon现有的各类仪表盘,并探讨了创建仪表盘的最佳设计实践。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AWS首席工程师"},{"type":"link","attrs":{"href":"https:\/\/www.linkedin.com\/in\/joshea","title":null,"type":null},"content":[{"type":"text","text":"John O'Shea"}],"marks":[{"type":"underline"}]},{"type":"text","text":"负责撰写这些构建者库中的新添文档。O'Shea指出,AWS的服务状态告知机制是通过仪表盘实现的,仪表盘向用户提供系统运行视图。但O'Shea也阐明,“我们发现只要运维过程需要人工检查仪表盘,那么无论多么频繁地检查仪表盘状态,也会由于人为错误而导致失败”。为解决这个问题,他们专注于创建一种自动化的告警机制,以评估系统运行所产生的最重要的数据。在某些情况下,报警会触发自动修复工作流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Amazon还对随时待命(oncall)事件使用了仪表盘。运维人员可以使用仪表盘定位并隔离问题。O'Shea给出的一个主要应用场景,就是在每周例行运维审核会议上使用。此类会议的与会者包括一些企业高层、高级管理人员和高级工程师。会议中使用一种称为“幸运转盘”("},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/blogs\/opensource\/the-wheel\/","title":null,"type":null},"content":[{"type":"text","text":"wheel of fortune"}],"marks":[{"type":"underline"}]},{"type":"text","text":")的工具,随机选择某个团队的仪表盘,基于此讨论用户体验和SLO问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为设计一致且实用的仪表盘,Amazon创建了一系列需遵循的通用设计原则,并给出了效果测定方式,以改进和推进这些原则。测定方法之一,就是新的运维人员是否能快速地理解和使用仪表盘。这种度量指标驱动的方法完全符合最近"},{"type":"link","attrs":{"href":"https:\/\/www.linkedin.com\/in\/camille-fournier-9011812\/","title":null,"type":null},"content":[{"type":"text","text":"Camille Fournier"}],"marks":[{"type":"underline"}]},{"type":"text","text":"在"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/news\/2020\/08\/fournier-internal-platform","title":null,"type":null},"content":[{"type":"text","text":"接受InfoQ采访"}],"marks":[{"type":"underline"}]},{"type":"text","text":"中提出的技术和策略。在这次采访中,她介绍了Amazon内部平台团队是如何交付更有效的产品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原则之一是应从最终用户预期的角度回推工作,确保仪表盘符合用户的需求。O'Shea指出,“对仪表盘创建者而言,构建一个自己完全理解的仪表盘是非常容易的。但这样的仪表盘可能对最终用户是毫无价值的”。他们发现,用户倾向于重点解读最新渲染出的图表,而传统设计理念是将最重要的图表置于仪表盘的最顶部。对于Web Service,通常最重要的是可用性的聚合图或汇总图,以及端到端延迟的百分比图表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其他设计原则包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"确保时区的一致性,并显示在仪表盘上。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在图表的布局上,需迁就预期的最小显示分辨率;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"确保提供可调整采集度量指标周期和时间间隔的功能;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在图上标注报警的阈值和目标值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"报警状态、简单数值和时序图组件可用于适当位置。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"O'Shea还说明了Amazon在用的各类仪表盘,其中最重要并广为使用的是用户体验仪表盘。此类仪表盘设计适用于各种利益相关者的需求,从服务运维者到管理人员。仪表盘展示服务的整体健康状态,以及多种当前进度情况的度量指标。所展示的数据可回答“受影响的客户数量”、“受影响最大的客户”等问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/72\/dc\/7237fd7b07cd106a3c54272eb59927dc.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"各类仪表盘是如何为不同系统层级提供视图(图片来源:"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/building-dashboards-for-operational-visibility\/","title":null,"type":null},"content":[{"type":"text","text":"Amazon官方网站"}],"marks":[{"type":"underline"}]},{"type":"text","text":")"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在系统和服务层面也应创建仪表盘,提供多种系统和服务运行状态视图,用于审计跨各区域的服务。系统层仪表盘上应包含足够的信息,支持查看系统任一端点的运行状态,服务层仪表盘应深入到所有的单一服务实例,为精准定位更深层次的问题提供视图。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指南最后探讨了仪表盘的维护问题。O'Shea写道:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"仪表盘的维护和更新,是集成于我们的开发过程中的。在完成变更前的代码审核期间,我们的开发人员会问,“是否有需要我们更新的仪表盘?”。因此我们授权开发人员,在部署变更前更新仪表盘。指南意在将仪表盘的创建和维护潜移默化到文化中。正如近期Tyler Treat在接受InfoQ采访时分享的,“文化是许多工作的出发点。我们必须提升可观察性的文化。如果团队并未将仪表盘展示作为系统的首要关注点,那么构建其它工具的意义也不大。”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,指南鼓励团队在事后剖析(post-mortem)中讨论是否需要改进仪表盘和自动化报警,以防患于未然,或是更快地发现问题。仪表盘的更改应使用与服务部署同样的工具,包括作为核心实践的版本控制和IaaS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/building-dashboards-for-operational-visibility\/","title":null,"type":null},"content":[{"type":"text","text":"最佳实践指南的全文"}],"marks":[{"type":"underline"}]},{"type":"text","text":"已加入到"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/","title":null,"type":null},"content":[{"type":"text","text":"Amazon"}],"marks":[{"type":"underline"}]},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/builders-library\/","title":null,"type":null},"content":[{"type":"text","text":"构建者库"}],"marks":[{"type":"underline"}]},{"type":"text","text":"中。资料库中包含了一系列的文档,阐述并探讨了Amazon构建、维护和操作软件的机制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文链接:"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/news\/2020\/10\/aws-dashboards\/","title":null,"type":null},"content":[{"type":"text","text":"AWS Publishes Best Practices Guide for Operational Dashboards"}],"marks":[{"type":"underline"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章