一个空格引发的“救火之旅” - 记一次 SOFA RPC 的排查过程

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"说明:即使你对 "},{"type":"link","attrs":{"href":"https://help.aliyun.com/product/144917.html","title":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"SOFA RPC"}]},{"type":"text","marks":[{"type":"italic"}],"text":" 的技术不熟悉,也能从这篇文章中体会到排查问题的实用技巧,希望对大家有所启发。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近某银行在测试环境发布了一套 SOFA RPC 应用,包括 SOFA RPC Service 和 SOFA RPC Reference。但是他们业务在调用 SOFA RPC 服务时出错了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个问题很诡异。诡异到一线和客户查了一天没有查出来。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于客户的项目时间紧张,所以这个问题升级到我这里处理。救火之旅就此开始。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"SOFA RPC 原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在进入正题之前,我们简单的介绍一下 SOFA RPC 的原理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e2/e22d386122d9474d6a08676688aee27b.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当 SOFA RPC Service 的应用启动的时候,他们是通过 ACVIP (公有云里叫 AntVIP ) 获取到 SOFA Registry (注册中心) 的地址的。ACVIP 的地址需要填入 RPC Service 和 RPC Reference 的代码配置文件里。RPC Service 会将该应用的 RPC 服务注册到 SOFA Registry 上。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当引用这个服务的 SOFA RPC Reference 应用启动时,会从 SOFA Registry 订阅到相应服务的元数据信息 ( SOFA RPC Service 的 IP,端口等信息)。SOFA Registry 收到订阅请求后,会将发布方的元数据实时推送给 SOFA RPC Reference。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当 SOFA RPC Reference 拿到元数据后,就可以从中获得服务地址,并发起调用。如图中 Reference 指向 Service。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"问题现象"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"让我们回顾一下问题的现象。SOFA RPC Reference 应用在尝试调用 SOFA RPC Service 的时候,报 “Cannot get service address of service XXXXXXXXX” 的错误。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d2/d2549e945de1f4d85b08eea42771d37b.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同时,客户还提到,同一套代码在测试环境中调用失败,但是在开发环境中调用成功。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"代码中,有 application-test.properties 和 application-dev.properties 配置文件,分别对应着测试环境和开发环境。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"注意:SOFA 的应用支持按照环境自动加载对应的配置文件(测试环境加载-test配置文件,开发环境加载-dev配置文件),对于该功能的详情,请点击"},{"type":"link","attrs":{"href":"https://help.aliyun.com/document_detail/133155.html","title":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"这里"}]},{"type":"text","marks":[{"type":"italic"}],"text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebd3479a9ceb85fe46e0365a27cb10db.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"排查过程"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"招式一、根据产品原理缩小问题范围"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在排查产品问题时,最有效的方式是从产品的原理的角度将问题范围缩小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据错误消息,我们知道 RPC Reference 没有获得服务的地址。那么没有获得服务的地址可能的原因有哪些?从上面的原理图,我们可以看到可能的原因有以下几种:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 没有连接上 SOFA Registry"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 没有从 ACVIP 处获得 SOFA Registry 的地址"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 跟 SOFA Registry 之间的 9600 端口异常"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SOFA Registry 异常"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 服务没有发布成功"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 跟 SOFA Registry 之间的 9600 端口异常"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 没有从 ACVIP 处获得 SOFA Registry 的地址"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 应用启动异常"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据原理,继续缩小范围的结果如下。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 没有连接上 SOFA Registry。排查结果 - 排除❎。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 没有从 ACVIP 处获得 SOFA Registry 的地址。排查结果 - 排除❎。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 \""},{"type":"text","marks":[{"type":"italic"}],"text":"netstat -an | grep 9600"},{"type":"text","text":"\" 发现该 RPC Reference 与 SOFA Registry 之间有 9600 端口的长连接,所以排除该问题。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Reference 跟 SOFA Registry 之间的 9600 端口异常。排查结果 - 排除❎。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根据 netstat 的输出,发现长连接状态为 "},{"type":"text","marks":[{"type":"italic"}],"text":"Established"},{"type":"text","text":",所以排除该问题。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"SOFA Registry 异常。排查结果 - 排除❎。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"测试环境和开发环境是共用一套 SOFA Registry 的,但是开发环境可以正常工作,所以 SOFA Registry 没有问题。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 服务没有发布成功。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 跟 SOFA Registry 之间的 9600 端口异常。排查结果 - 异常✅"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 \""},{"type":"text","marks":[{"type":"italic"}],"text":"netstat -an | grep 9600"},{"type":"text","text":"\" 发现该 RPC Service 与 SOFA Registry 之间没有 9600 端口的长连接!"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 没有从 ACVIP 处获得 SOFA Registry 的地址。排查结果 - 异常✅"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"经过排查,ACVIP 的日志文件 (/home/admin/logs/acvip-java-client)没有生成。这说明 ACVIP 框架没有加载。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Service 应用启动异常。排查结果 - 排除❎。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"检查应用启动日志,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"/home/admin/logs/"},{"type":"text","text":"stderr.log 为空。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到了这一步,已经定位到 RPC Service 应用的 ACVIP 框架没有被加载,所以导致后续流程无法继续。这时候,我们怀疑是应用代码的问题。可是从 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"/home/admin/logs/"},{"type":"text","text":"stderr.log 和 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"/home/admin/logs/"},{"type":"text","text":"stdout.log 中,我们没有发现异常,应用也能够正常启动,只是不能加载 ACVIP 框架。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"招式二、引入第三方缩小问题范围"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"承接上文,排查到这里就卡住了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客户觉得是 SOFA RPC 框架的问题,但是我们排查下来猜测是客户代码配置的问题。可是客户再次强调开发环境和测试环境用的是同一套代码,开发环境是好的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这时候,为了让客户和我们站在同一个排查方向上(对于这种开发问题,要是客户和我们不在一个排查方向上,会严重影响问题排查效率),相信是代码的问题,需要引入第三方从侧面证明是代码问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们将自己写的一个最简单的 SOFA RPC 的 Demo 发给客户,然后使用测试环境的配置运行。该 Demo 能够正常运行。这说明肯定还是客户代码的问题。这时候,客户和我们站在统一战线。客户也在帮忙检查代码是否还有为检查的遗漏点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可惜,客户还是没能看出代码的问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"招式三、对比大法找突破口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"承接上文,排查过程再次卡住。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"于是,我们再重头梳理一遍问题,我们知道开发环境和测试环境用的是同一套代码,只是配置文件不一样。那么我们在同一台 RPC provider 的机器上,部署开发环境的配置是否可以运行成功?这时,我们把 application-dev.properties 文件删除,只保留 application-test.properties,同时只更改以下三个配置属性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d8/d8136855d024a4bfd46ace0bcfc1d5b1.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更改了配置之后,配置文件的内容是开发环境的对应的值,运行成功。重新修改回测试环境的配置,运行失败。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于是在同一台机器上运行,可以排除掉很多异构因素。所以可以开始进行日志对比大法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们将这两次测试的所有日志都收集上来。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运行成功的应用因为加载了 ACVIP,所以生成了 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"/home/admin/logs/acvip-java-client 目录以及文件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"运行失败的应用因为没有加载 ACVIP,所以没有生成 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"/home/admin/logs/acvip-java-client 目录。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"通过对比 stdout.log,也没有发现什么不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"但是我们从运行成功的应用的 /home/admin/logs/acvip-java-client 目录下面发现 STARTUP.log 的日志,其记录了 ACVIP 初始化过程。从这个日志我们能看到 ACVIP 初始化的时间点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"招式四、时间回溯,逼近真相"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"承接上文,我们拿到 ACVIP 初始化的时间点:"},{"type":"text","text":"2020-07-01T18:35:12,086。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8b/8b0bedae20046d5869e0f18f3c1e860e.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"拿到 ACVIP 初始化的时间点,有什么用呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"当信息太碎片化的时候,我们往往发现不了这些信息的价值。但是当我们找到一根主线将这些信息串联起来,那我们很可能得到一个价值连城的故事。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#404040","name":"user"}}],"text":"现在,就是我们用主线将碎片化的信息串联起来的时候了。而主线就是时间点。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因为我们知道了 ACVIP 的初始化的时间点,那么我们只需要关心这个时间点前发生的故事。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到 stdout.log,参考这个时间点,我们发现在那个时间点 SOFA 框架正在初始化 DRM (动态配置组件),但是有报错。2020-07-01 18:35:12:at com.alipay.drm.client.remoting.client.ClientManager."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f1/f14c708925a62d42b674a44f6d70c336.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个错误在正常运行和非正常运行的测试场景都有,而且 callstack 一模一样。所以这个错误本身并不重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重要的是,发生这个错误的时候,SOFA 框架正在初始化 DRM 组件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接着,我们将重心转移到 DRM 组件的初始化:查看运行正常的 /home/admin/logs/drm-boot.log 日志。我们发现这一条 “Query zdrmdata url pool from antvip”,这说明 DRM 组件从 ACVIP 那边获得了 DRM 服务端的 IP。同时,我们注意到 DRM 初始化的时间是 2020-07-01 18:35:11,947,这比 ACVIP 的初始化时间晚!这说明是 DRM 触发了 ACVIP 初始化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"bgcolor","attrs":{"color":"#FADB14","name":"user"}}],"text":"2020-07-01 18:35:11,947 INFO  Start building distributed resource ..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2020-07-01 18:35:11,991 INFO  Init access key from system param: middleWarexxxx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2020-07-01 18:35:11,991 INFO  Init secretKey key from system param."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2020-07-01 18:35:11,991 INFO  Init instance id name from system param: xxxInstanceID"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"bgcolor","attrs":{"color":"#FADB14","name":"user"}}],"text":"2020-07-01 18:35:13,205 INFO  Query zdrmdata url pool from antvip:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下来,让我们再来看看非正常运行的应用的 DRM 日志。该日志有报错:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#F5222D","name":"user"}}],"text":"“ERROR Query confreg http url failed!”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"bgcolor","attrs":{"color":"#FADB14","name":"user"}}],"text":"2020-07-01 19:38:44,511 INFO  Start building distributed resource ..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2020-07-01 19:38:44,557 INFO  Init access key from system param: middleWarexxxx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"2020-07-01 19:38:44,557 INFO  Init secretKey key from system param."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"bgcolor","attrs":{"color":"#FADB14","name":"user"}}],"text":"2020-07-01 19:38:44,605 ERROR Query confreg http url failed!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如我们盯着这个错误看,可能被误导到一个错误的方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过对比两者的 DRM 日志,我们发现正常运行的应用的 DRM 日志会 “"},{"type":"text","marks":[{"type":"italic"}],"text":"Init instance id name from system"},{"type":"text","text":"”,但是非正常运行的应用的 DRM 日志却没有。而 instanceID 是 acvip 寻址所必须的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"招式五、研究代码逻辑,重现问题"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"从 DRM 日志中,我们发现了 DRM 初始化逻辑轨迹的不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而日志内容是通过代码里的 Logger 设置的。所以我们可以通过日志内容去代码里看相应的逻辑(这是客户端的代码,客户和我们都可以查看)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们通过在代码里面搜索 “"},{"type":"text","marks":[{"type":"italic"}],"text":"Init secretKey key from system param"},{"type":"text","text":"”,发现在加载 instance id 之前,代码里有一个判断。这个判断是,假如 antCloud 为 True,才会去加载 instance id。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这个 antCloud 为不为 True 取决于配置文件中 com.alipay.env 的值是否为 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#F5222D","name":"user"}}],"text":"shared"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/79/79910dbace792b0d1c36107d9525ad1b.png","alt":"image.png","title":"image.png","style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们的代码里面使用的是"},{"type":"text","marks":[{"type":"strong"}],"text":" equalsIgnoreCase"},{"type":"text","text":" 在做对比,"},{"type":"text","marks":[{"type":"strong"}],"text":"但是代码里面没有考虑该值有空格的情况(没有trim),所以当配置文件中 com.alipay.env 的值是 "},{"type":"text","marks":[{"type":"color","attrs":{"color":"#F5222D","name":"user"}},{"type":"strong"}],"text":"shared+空格"},{"type":"text","marks":[{"type":"strong"}],"text":",而不是 shared, 那么就无法触发 ACVIP 初始化了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了验证我们的想法,我们重新检查了 stdout.log。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非正常运行的应用是这样的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"Not find key com.alipay.env in Java -D argument, put value shared  into System"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#F5222D","name":"user"}}],"text":"(shared后多了一个空格)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正常运行的应用是这样的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"Not find key com.alipay.env in Java -D argument, put value shared into System"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直接从配置文件的截图是看不出来是否 shared 后面有没有空格的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们马上在本地做了一个实验,果然可以重现问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"解决方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客户将 application-test.properties 里面的 shared 后面的空格去掉后,问题解决。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最后,我将这几个招式总结到下面的流程图中。"}]},{"type":"image","attrs":{"src":"https://static001.infoq.cn/static/write/img/img-copy-disabled.4f2g7h.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章