Netflix如何通过GraphQL Federation扩展API?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/how-netflix-scales-its-api-with-graphql-federation-part-1-ae3557c187e2","title":null,"type":null},"content":[{"type":"text","text":"之前的文章"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"和"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/presentations\/netflix-api-graphql-federation\/","title":null,"type":null},"content":[{"type":"text","text":"QConPlus讨论"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"中,我们讨论了GraphQL Federation作为一种解决方案是如何分发我们的GraphQL模式及其实现的。 在这篇文章中,我们会将注意力转移到成功运行联邦GraphQL平台所需的内容上——从它实现的过程到我们汲取的经验教训等方面。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fc\/73\/fc3508b87a37e6624c69fb6b7da3bc73.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"我们迄今为止的旅程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在过去的一年中,我们已经实现了联邦GraphQL架构所需的核心基础架构,正如我们前一篇文章所描述的那样:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/90\/90608eb3c84349eeceb723d0a268b0b3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Studio Edge架构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"该平台上的第一个Graph域服务(Domain Graph Service,DGS)是我们在第一篇文章(Studio API)中讨论过的GraphQL单体应用。接下来,我们与其他几个应用程序团队合作一起制作了DGS,使其API能与之前的单体应用一起公开。到2019年底,我们有了第一个使用联邦Graph的Studio应用程序,性能没有任何下降。我们知道这个架构是可行的之后,就专注于为它更广泛的使用做准备了。我们的目标是在2020年4月开放Studio Edge自助服务平台。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2020年4月是一个动荡的时期,疫情大流行,人们一夜之间就过渡到了远程办公。尽管如此,各小组还是开始了成群结队地投入到Graph中。很快,我们每天都有数百名工程师直接为API贡献力量。而那个曾经是瓶颈的Studio API单体应用呢?我们将Studio API公开的字段迁移到个人拥有的DGS中,而不破坏用户的API。原单体应用预计将在2020年底完全弃用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这段旅程并非没有挑战。最大的挑战在于需在整个组织内协调这一战略。最初,有很多人持怀疑态度并有异议;这个概念相当新,需要整个组织高度一致才能成功。我们的团队花了很多时间来解决不同意见,并根据开发人员的反馈对架构进行了调整。通过我们的原型开发以及与一些关键批评声音的积极合作,我们能够逐渐增加信心并弥合关键差距。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一旦我们在这个想法上达成了广泛的一致,我们就需要确保其采用是无缝的。 这就需要构建鲁棒性强的核心基础设施,以确保良好的开发人员体验,并能解决关键的跨域问题。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"核心基础设施"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们的GraphQL网关(GraphQL Gateway)基于Apollo的参考实现,并且用"},{"type":"link","attrs":{"href":"https:\/\/kotlinlang.org\/","title":null,"type":null},"content":[{"type":"text","text":"Kotlin"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"编写。这使得我们能够访问Netflix的Java生态系统,同时也为我们提供了鲁棒性强的语言特性,比如用于高效并行抓取的"},{"type":"link","attrs":{"href":"https:\/\/kotlinlang.org\/docs\/reference\/coroutines-overview.html","title":null,"type":null},"content":[{"type":"text","text":"协程(coroutines)"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",以及具有null安全的表述性类型系统(expressive type system)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"模式注册(Schema Registry)是内部开发的,也是用的Kotlin。为了存储模式变更,我们使用了一个"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/scaling-event-sourcing-for-netflix-downloads-episode-2-ce1b54d46eec","title":null,"type":null},"content":[{"type":"text","text":"内部库"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",该库是在"},{"type":"link","attrs":{"href":"https:\/\/cassandra.apache.org\/","title":null,"type":null},"content":[{"type":"text","text":"Cassandra"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"数据库之上实现的"},{"type":"link","attrs":{"href":"https:\/\/microservices.io\/patterns\/data\/event-sourcing.html","title":null,"type":null},"content":[{"type":"text","text":"事件源模式"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。使用事件源允许我们实现新的开发者体验特性,比如Schema History视图。模式注册还集成了我们的CI\/CD系统,如"},{"type":"link","attrs":{"href":"https:\/\/spinnaker.io\/","title":null,"type":null},"content":[{"type":"text","text":"Spinnaker"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",可以为DGS自动设置云网络。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"开发人员教育与经验"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在之前的架构中,只有单体应用Studio API的团队需要学习GraphQL。在Studio Edge中,每个DGS团队都需要在GraphQL上积累专业知识。GraphQL有自己的学习曲线,对于像"},{"type":"link","attrs":{"href":"http:\/\/batching","title":null,"type":null},"content":[{"type":"text","text":"批处理(Batching)"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"和"},{"type":"link","attrs":{"href":"https:\/\/www.graphql-java.com\/blog\/deep-dive-data-fetcher-results\/","title":null,"type":null},"content":[{"type":"text","text":"先行断言(Lookahead)"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"这样的复杂情况,它会变得特别棘手。另外,正如前一篇文章所讨论的那样,理解GraphQL Federation和实现实体解析器(Entity Resolver)也不是一件容易的事。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们与Netflix的开发者体验(Netflix’s Developer Experience,DevEx)团队合作,为开发人员提供文档、培训材料和教程。对于一般的GraphQL问题,我们依赖于开源社区,并建立了一个内部GraphQL社区来讨论诸如分页、错误处理、可空性和命名约定之类的热门话题。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DGS框架和开发者工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了方便后端工程师构建GraphQL DGS,DevEx团队在GraphQL Java和Spring Boot的基础之上构建了一个“DGS框架”(“DGS Framework”)。该框架解决了运行在生产环境中的GraphQL服务的所有跨域问题,同时也使开发人员能更容易地编写GraphQL解析器。此外,DevEx还构建了强大的工具,可用于将模式推送到模式注册Schema Registry中,并构建了一个自助服务UI(Self Service UI),可用于浏览各种DGS模式。点击查看"},{"type":"link","attrs":{"href":"https:\/\/plus.qconferences.com\/plus2020\/presentation\/using-devex-accelerate-graphql-federation-adoption-netflix","title":null,"type":null},"content":[{"type":"text","text":"他们的会议演讲"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",期待我们的同事将来能发表一篇相关的博文。DGS框架计划于2021年初开源。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"模式治理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Netflix的Studio数据极其丰富和复杂。在早期,我们预期活动的模式管理对模式演进和整体的健康是至关重要的。我们组织中已经有一位Studio数据架构师(Studio Data Architect)了,他专注于跨Studio的数据建模和对齐。我们与他们合作,一起确定了最适用于Studio Engineering需求的Graph模式的最佳实践。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们的目标是设计一个能够反映域本身而不是数据库模型的GraphQL模式。UI开发人员不必构建面向前端的后端(Backends For Frontends,BFF)来根据他们的需要处理数据,相反,他们应该协助一起来塑造模式,以满足他们的需求。拥抱协作的模式设计方法对于实现这一目标至关重要。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bb\/5a\/bbde2400dec8d7d3c0620b11c869015a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Schema设计工作流"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"协作设计过程涉及跨团队的反馈和审查。为了简化模式设计和审查,我们成立了一个模式工作组并开发了一个用于加入联邦架构的托管技术程序。虽然审查会增加产品开发过程的开销,但我们相信,优先考虑Graph模型的质量能减少未来的变更及其所需的反工量。审查的级别因受影响的实体而异;对于核心联邦类型,则需要更严格的审查(尽管工具有助于简化该流程)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们有一个用于改进模式的deprecation工作流。我们利用了GraphQL的"},{"type":"link","attrs":{"href":"https:\/\/spec.graphql.org\/June2018\/#sec--deprecated","title":null,"type":null},"content":[{"type":"text","text":"deprecation"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"特性,并且还跟踪了模式中每个字段的使用统计信息。一旦统计数据表明已弃用字段不再使用,我们就可以进行向后不兼容的变更,将该字段从模式中删除。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/57\/20\/57ab79a3d755c8312c802cfedd3fae20.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"显示已弃用字段使用情况的客户端"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们采用了模式优先的方法,而不是从现有的模型中(例如gRPC API的Protobuf对象)生成模式。虽然"},{"type":"link","attrs":{"href":"https:\/\/developers.google.com\/protocol-buffers","title":null,"type":null},"content":[{"type":"text","text":"Protobu"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"f和"},{"type":"link","attrs":{"href":"https:\/\/grpc.io\/docs\/what-is-grpc\/introduction\/","title":null,"type":null},"content":[{"type":"text","text":"gRPC"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"是构建服务API的优秀解决方案,但我们更喜欢将GraphQL模式与这些层解耦,以实现更清晰的Graph设计和独立的可扩展性。在某些场景中,我们实现了从GraphQL解析器到gRPC调用的通用映射编码,但是额外的样板对GraphQL API的长期灵活性来说是值得的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们使用的方法是基于“上下文高于控制”(“context over control”)的,这是"},{"type":"link","attrs":{"href":"https:\/\/jobs.netflix.com\/culture","title":null,"type":null},"content":[{"type":"text","text":"Netflix文化的一个重要原则"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。我们没有试图严格控制整个Graph,而是为产品团队提供指导和上下文,这样他们就可以应用自己的领域知识为自己的领域创建灵活的API。随着该架构的成熟,我们将继续监控模式运行情况,并在需要的时候开发新的工具、流程和最佳实践。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可观测性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在我们之前的架构中,可观测性是通过人工分析和API团队的路由来实现的,它的可伸缩性很差。对于我们的联邦架构,我们以一种更具可伸缩性的方式来优先解决可观测性的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们优先考虑如下三个方面:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"报警——当发生错误时"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"(when)"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"报告"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"发现——轻松确定哪些"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"(what)"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"是不工作的"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"诊断——调试某些东西不工作的原因"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"(why)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们在这一领域的指导度量指标是平均修复时间(MTTR)以及服务等级目标和服务等级指标(SLO\/SLI)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们与Netflix遥测团队的专家合作。我们将网关和DGS架构组件与"},{"type":"link","attrs":{"href":"https:\/\/zipkin.io\/","title":null,"type":null},"content":[{"type":"text","text":"Zipkin"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"、内部分布式跟踪工具"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/edgar-solving-mysteries-faster-with-observability-e1a76302c71f","title":null,"type":null},"content":[{"type":"text","text":"Edgar"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"和应用程序监控工具"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/telltale-netflix-application-monitoring-simplified-5c08bfa780ba","title":null,"type":null},"content":[{"type":"text","text":"TellTal"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"e集成在一起。在GraphQL中,几乎每个响应都是200,错误块中包含了自定义的错误。我们从响应中检查这些自定义的错误代码,并将它们发送到我们的度量服务"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a","title":null,"type":null},"content":[{"type":"text","text":"Atlas"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"中。这些集成为GraphQL API的使用者和开发人员奠定了丰富的可见性和洞察力的良好基础。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/83\/b9\/83a304e4150aceb8a411ec831dbe9cb9.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"监控联邦请求生命周期的Edgar Trace"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d2\/d29ab922234b9834563a68a6d3397c06.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"联邦请求的时间轴视图"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"分布式日志关联(Distributed Log Correlation)有助于调试更复杂的服务问题。通过显示处理请求所涉及的所有系统的应用级日志详细信息,我们可以更深入地了解堆栈中发生的事情。开发人员可以很容易地看到与给定请求同时发生的事情,以排查可能影响交互的周边因素。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5c\/5c6aeec5c8eae500fb8d79930993c3dc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一个跨多个服务记录的联邦请求的日志"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"为了解决“我应该问谁("},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"who"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":")……”的路由问题,我们集成了从GraphQL类型和字段到它们所属团队的支持(support)渠道的深度链接。现在,寻找support只需单击跟踪中的链接,这有助于缩短MTTR并能减少网关团队所需的参与次数。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"保护联邦Graph的安全"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" 我们的目标是在整个联邦架构中实现可靠且一致的安全实践。为了实现这一点,我们与Netflix的安全专家合作,将安全性构建到Graph中。让我们来看下我们安全解决方案的两个基本部分:AuthN和AuthZ。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"身份认证"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们在Studio空间中的所有产品体验都需要一个经过认证的帐户,因此我们将GraphQL网关的访问权限限制为只允许受信任的经过身份验证的调用者。此外,"},{"type":"link","attrs":{"href":"https:\/\/graphql.org\/learn\/introspection\/","title":null,"type":null},"content":[{"type":"text","text":"Graph Introspection"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"仅限于Netflix的内部开发人员。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"授权"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在Studio Edge之前,授权逻辑在各个团队之间是分散的。一些团队在他们的BFF中实现了授权,一些在微服务中实现了授权,而另一些团队则在这两处兼而有之。结果往往是,由于用户访问的UI不同,对于给定的数据,授权也不同。UI团队还发现他们需要对每个新的前端实现(以及重新实现)进行授权检查。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在Studio Edge中,我们将授权责任委托给DGS的所有者。这就实现了在不同应用程序中对同一用户进行一致的授权。此外,产品经理、工程师和安全团队可以很容易地了解谁可以访问哪种数据类型以及如何访问。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们在Netflix内部提供了多种授权服务:从基于用户身份授予访问权限的简单系统,到引入角色和功能概念的更细粒度的系统。DGS开发人员可以根据自己的需要选择解决方案。然后,他们只需使用@Secured注解对解析器进行注解,并将其配置为使用一个可用的系统即可。如果需要,可以在解析器或下游系统中实现更复杂的授权。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"授权的未来"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们目前正在构建一个支持GraphQL授权解决方案的原型。注册模式时,Schema Registry会自动为每个字段及其相应类型生成访问控制组(Access Control Groups,ACGs)。产品经理和DGS工程师可以决策这些生成的ACG的成员资格和规则。由于ACG映射到GraphQL中的某个字段,因此DGS框架会在执行期间自动应用与ACG关联的规则。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"面向失败的架构设计"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GraphQL网关是所有请求的单一入口点;网关上的故障可能会导致严重的中断。根据Netflix工程的最佳实践,我们假设会发生故障,并设计方法来减轻这些故障的影响。以下是我们用来确保网关层具有弹性的设计原则:"}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"单一用途"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"无状态服务"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"需求控制"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"多区域"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"按功能分片"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,我们将网关层的职责集中在一个用途上:解析客户端查询,然后构建和执行查询计划。通过缩小范围,我们限制了可能发生的问题的范围。我们的目标是即时执行日志记录和度量指标之外的任何其他资源密集型操作。在网关层中添加其他不相关的逻辑可能会增加在这一关键层中出现故障的表面积。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其次,我们运行网关服务的多个无状态实例。任何网关实例都能够为任何请求生成和执行查询计划。当我们对网关层进行代码变更时,我们会在正式投入生产之前对它们进行严格的测试。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第三,我们通过应用需求控制来平衡每个请求消耗的资源。我们限制调用方的速率,以避免底层数据库过载,这些数据库是大多数领域元素的来源。我们还对所有传入的查询运行静态查询开销计算,并拒绝昂贵的查询,以避免网关和DGS资源的阻塞。我们的合作伙伴了解这些折衷方案,并与我们一起来满足这些要求,重新处理昂贵的查询并减少大的调用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第四,我们在世界各地的多个AWS区域中部署了网关层。这使我们能够限制爆炸半径以应对发生不可避免的问题。当问题发生时,我们可以将故障转移到另一个区域,以确保我们的客户受到的影响最小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最后,我们部署多个网关层的功能分片(Shard),每个分片中的代码均相同,传入的请求根据类别进行路由。例如,GraphQL订阅(Subscription)通常会导致长时间的连接,而查询(Query)和突变(Mutation)则是短暂的。我们使用一个单独的实例组来处理订阅,因此“连接耗尽”也不会影响查询和突变的可用性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我们还可以做更多的事情来提高弹性。我们计划对网关部署以及最终的模"},{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/automated-canary-analysis-at-netflix-with-kayenta-3260bc7acc69","title":null,"type":null},"content":[{"type":"text","text":"式变更进行金丝雀(Canary)部署和分析"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"。今天,我们的网关通过轮询模式注册来动态更新其模式。我们正在通过将联邦配置存储在一个版本化的S3存储桶中来实现这些功能的解耦,从而使网关能够抵御模式注册的故障。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"结束语"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"GraphQL和Federation已成为Studio应用程序的效率倍增器。基于此,我们最近使用GraphQL Federation为iOS和Android上的Netflix消费者应用程序搜索页面创建了原型。为了做到这一点,我们创建了三个DGS来为消费者Graph的最小部分提供数据。我们正在将一小部分用户切流到这个替代技术栈中,并测量其高层度量指标。我们很高兴看到结果,并进一步探讨其在Netflix消费者领域的适用空间。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"尽管我们拥有积极的经验,但GraphQL Federation尚处于成熟期的早期,可能对每个团队和组织来说并不都是最合适的。学习GraphQL和DGS开发、运行联邦层并进行迁移需要合作团队的高度投入以及无缝的跨功能协作。如果你正在考虑朝这个方向发展,我们建议你查看Apollo为"},{"type":"link","attrs":{"href":"https:\/\/www.apollographql.com\/docs\/federation\/","title":null,"type":null},"content":[{"type":"text","text":"Federation提供的SaaS产品"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",以及许多学习GraphQL相关的在线资源。对于像我们这样拥有大量的微服务并需要其聚合在一起的生态系统来说,开发速度和可操作性的提高使得这种转变是值得的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最后,我们想听听你的意见!如果你已经实现了Federation,或者尝试用另一种方法来解决这个问题,我们很乐意向你了解更多信息。分享知识是我们这个行业快速学习和提高的方式之一。最后,如果你想参与解决诸如Netflix弹性相关的复杂而有趣的问题,请查看我们的"},{"type":"link","attrs":{"href":"https:\/\/jobs.netflix.com\/","title":null,"type":null},"content":[{"type":"text","text":"招聘页面"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"或直接联系我们。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#222222","name":"user"}},{"type":"strong"}],"text":"英文原文链接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/netflixtechblog.com\/how-netflix-scales-its-api-with-graphql-federation-part-2-bbe71aaec44a","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/netflixtechblog.com\/how-netflix-scales-its-api-with-graphql-federation-part-2-bbe71aaec44a"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章