松散耦合深度学习Serving的优势和部署案例

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"本文要点"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着深度网络变得更加专业化,对资源的需求也越来越大,对于初创公司和规模化扩张的企业而言,在预算紧张的环境中为这些运行在加速硬件上的网络提供服务(serving)也变得越来越困难。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"松散耦合的架构可能是更好的选项,因为它们在为深度网络提供服务时具有高度可控性、易适应性、透明可观察性和自动缩放性(成本效益较高)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何规模的公司都可以利用各种托管云组件(例如函数、消息服务和API网关),使用相同的服务基础架构来处理公共和内部请求。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"托管消息代理带来了轻松的可维护性,无需专门的团队负责维护工作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"适应无服务器和松散耦合组件后,深度学习解决方案的开发和交付速度也可能会提升。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"随着深度学习在许多行业的应用范围不断扩大,深度网络的规模和特异性也在增加。大型网络需要更多资源,并且由于它们的任务特定性(如定制的函数\/层),将它们从急切执行(eager execution)编译为运行在CPU或FPGA后端上的优化计算图可能是无法做到的。因此,这种模型在运行时可能需要显式GPU加速和自定义配置。然而,深度网络都是在资源有限的约束环境中运行的,环境中云GPU的价格颇为昂贵,且低优先级(即竞价、抢占式)实例相当稀缺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于调用者和API处理者之间的紧密时间耦合,使用常见的机器学习服务框架将这些网络投入生产,可能会给机器学习工程师和架构师带来很多麻烦。这种情况是非常有可能出现的,特别是对于采用深度学习的初创公司和规模化扩张的公司来说更是如此。从GPU内存管理到缩放,他们在服务深度网络时可能会面临多个问题。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我将重点介绍一种部署深度网络的替代方法的优势,这种方法会暴露基于消息的中介。这会放松我们在REST\/RPC API服务中看到的紧密时间耦合,并带来异步运行的深度学习架构,在初创和扩张公司工作的工程师会更喜爱这种架构。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我将使用四大指标来对比这个服务架构与REST框架:"},{"type":"text","marks":[{"type":"strong"}],"text":"可控性"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"适应性"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"可观察性"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"strong"}],"text":"自动缩放性"},{"type":"text","text":"。我将进一步展示如何使用一系列现代云组件轻松地将REST端点添加到面向消息的系统中。本文将要讨论的所有想法都是云和语言无关的。它们也可以用在本地服务器上托管的场景中。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"服务深度网络的挑战"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深度网络是由一些高度级联的非线性函数组成的,这些函数形成应用于数据的计算图。在训练阶段,网络使用选定的输入\/输出对以最小化选定目标的方式来调整这些图的参数。在推理时,输入数据简单地流过这个优化图。正如上述介绍所示,任何深度网络都要面对的一个明显的挑战就是它的计算密集度。知道了这一点后,你可能会惊讶地发现,基于紧密时间耦合的REST\/RPC API调用是服务深度网络的最常见方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只要构成深度网络的图可以通过层的融合、量化或剪枝进行优化,基于API的服务就不会引发任何问题。然而,这种优化并不总能得到保证,尤其是在将研究网络转移到生产环境时往往会出现优化不足的问题。研发阶段产生的大多数想法都具有特异性,为优化计算图而创建的通用框架可能不适用于这种网络(例如,Pytorch\/ONNX JIT无法编译具有"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/1710.05941.pdf","title":"","type":null},"content":[{"type":"text","text":"Swish"}]},{"type":"text","text":"激活函数的层)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在这种情况下,由REST API带来的紧密耦合是不理想的。提高推理速度的另一种方法是将图编译到专用硬件上运行,这种专用硬件设计为能够很好地并行化图中执行的计算(例如FPGA、ASIC)。但同样,由于自定义函数需要通过硬件描述语言(如Verilog、VHDL)集成到FPGA中,因此特异性问题是没办法用这种方式来处理的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考虑到深度网络还将继续扩大规模,并根据行业需求变得越来越专业化,预计在不久的将来推理时也会用上显式GPU加速了。因此,分离调用者和服务函数之间的同步接口,并允许高度可控的基于拉取的推理,在许多层面上都更具优势。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"打破紧密的时间耦合"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们可以向系统添加额外的中间件服务来放松时间耦合。在某种程度上,这更像是通过电子邮件服务提供商与你的邻居交流,而不是在邻居家窗外喊话。使用消息中间件(例如RabbitMQ、Azure服务总线、AWS SQS、GCP Pub\/Sub、Kafka、Apache Pulsar、Redis)后,目标现在可以完全灵活地处理调用者的请求(就像邻居可能会忽略你的电子邮件,直到他\/她吃完晚餐为止)。这是特别有利的,因为从工程师的角度来看,它实现了"},{"type":"text","marks":[{"type":"strong"}],"text":"高度可控"},{"type":"text","text":"。考虑在具有8Gb内存的GPU上部署2个深度网络(在推理时需要3Gb和6Gb内存)的情况。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在基于REST的系统中,可能需要事先采取预防措施来确保这两种模型的worker不会过度使用GPU内存,否则由于直接调用,某些请求将失败。另一方面,如果使用一个队列,worker可能会选择推迟工作,等到稍后有内存可用时再处理。由于这是异步处理的,因而调用者不会被阻塞并且可以继续执行其工作。这种场景尤其适合公司内部的请求,例如时间约束相对宽松的请求,但这种队列也可以使用云组件(例如无服务器函数)实时处理客户端或合作伙伴的API请求,如下一节所述。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"选择消息中介的DL服务的另一个令人信服的理由是其"},{"type":"text","marks":[{"type":"strong"}],"text":"简单的适应性"},{"type":"text","text":"。如果想要充分利用Web框架和库的潜力,我们就一定要面对它们的学习曲线,即便是Flask这样的微框架也不例外。另一方面,我们并不需要了解消息中间件的内部结构,而且所有主流云供应商都提供自己的托管消息服务,于是工程师就用不着再管维护工作了。这在"},{"type":"text","marks":[{"type":"strong"}],"text":"可观察性"},{"type":"text","text":"方面也有很多优势。由于消息传输通过显式接口与主要的深度学习worker分离,因此可以独立聚合日志和指标。在云上,这甚至可能都用不着了,因为托管消息传递平台会使用仪表板和警报等附加服务自动处理日志记录。这样的队列机制本身也适合自动缩放。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于高可观察性,队列带来了选择如何"},{"type":"text","marks":[{"type":"strong"}],"text":"自动缩放"},{"type":"text","text":"worker的自由。在下一节中,我们将使用"},{"type":"link","attrs":{"href":"https:\/\/keda.sh\/","title":"","type":null},"content":[{"type":"text","text":"KEDA"}]},{"type":"text","text":"(Kubernetes事件驱动的自动缩放)展示DL模型的可自动缩放容器部署。它是一个开源的基于事件的自动缩放服务,旨在简化K8s pod的自动管理。它目前是一个云原生计算基金会(Cloud Native Computing Foundation)沙箱项目,支持多达30个缩放器,从Apache Kafka到Azure服务总线(Azure Service Bus)、AWS SQS和GCP Pub\/Sub都在支持之列。KEDA的参数让我们可以根据传入的数据量(例如等待消息的数量、持续时间和负载大小)自由地优化缩放机制。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一个部署示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本节中,我们将使用Pytorch worker容器和Azure上的Kubernetes展示一个模板部署示例。除了网络权重、输入和可能的输出图像等大型工件外,数据通信将由Azure服务总线处理。它们应该存储在blob存储中,并使用用于blob的Azure Python SDK从容器下载\/上传。该架构的高级概述见下图1。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/res.infoq.com\/articles\/loosely-coupled-deep-learning-serving\/en\/resources\/1fig-1-1627304648731.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"图1:所提议架构的高级概述。对于每个块,括号中给出了相应的Azure服务。它可以处理使用无服务器函数的外部REST API请求和直接来自队列的内部请求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我们将使用Azure服务总线队列和KEDA实现一个可自动缩放的"},{"type":"link","attrs":{"href":"https:\/\/www.enterpriseintegrationpatterns.com\/patterns\/messaging\/CompetingConsumers.html","title":"","type":null},"content":[{"type":"text","text":"竞争消费者模式"}]},{"type":"text","text":"服务器。要启用"},{"type":"link","attrs":{"href":"https:\/\/www.enterpriseintegrationpatterns.com\/patterns\/messaging\/RequestReply.html","title":"","type":null},"content":[{"type":"text","text":"请求-回复"}]},{"type":"text","text":"模式来处理REST请求,可以使用"},{"type":"link","attrs":{"href":"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/durable\/durable-functions-external-events?tabs=python#send-events","title":"","type":null},"content":[{"type":"text","text":"Azure Durable Function外部事件"}]},{"type":"text","text":"。在示例架构中,我们假设一个持久函数已准备就绪,并通过服务总线队列将反馈事件回复URL传输到worker线程,Azure文档中解释了设置此服务的细节。KEDA允许我们使用队列长度设置缩放规则,这样K8s中的worker pod数量将根据负载自动更新。我们还将一个worker容器(或在我们的例子中的多个容器)绑定到一个GPU上,这样我们就可以自动缩放任何托管集群,并向我们的系统添加更多GPU机器,而不会出现任何麻烦。K8s自动处理集群的自动缩放以解决资源约束(即由于GPU数量不足导致的"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/node-pressure-eviction\/","title":"","type":null},"content":[{"type":"text","text":"节点压力"}]},{"type":"text","text":")。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以在这个"},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving","title":"","type":null},"content":[{"type":"text","text":"Github存储库"}]},{"type":"text","text":"中找到描述如何为常规ResNet分类器提供服务的详细模板。文中将显示每个块的缩短版本。第一步,我们来创建我们的深度网络服务函数("},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving\/blob\/master\/network.py","title":"","type":null},"content":[{"type":"text","text":"network.py"}]},{"type":"text","text":")。初始化推理函数的模板类可以写成如下形式,这可以根据手头的任务(例如分割、检测)进行定制:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"class Infer(object):\n __slots__ = [\"\"]\n \n # 初始化推理函数(例如,从blob下载权重)\n def __init__(self, tuned_weights=None):\n pass\n \n # 执行推理\n @torch.no_grad()\n def __call__(self, pil_image):\n pass\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在原始函数中,我们返回前5个ImageNet类别的类ID。随后,我们准备编写我们的worker Python函数("},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving\/blob\/master\/run.py","title":"","type":null},"content":[{"type":"text","text":"run.py"}]},{"type":"text","text":"),这里我们将模型与Azure服务总线集成在一起。如下面的片段所示,用于服务总线的Azure Python SDK支持对传入的消息队列进行非常简单的管理。PEEK_LOCK模式允许我们明确控制何时完成或放弃传入请求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"...\nwith servicebus_client:\n receiver = servicebus_client.get_queue_receiver(\n queue_name=self.bus_queue_name,\n receive_mode=ServiceBusReceiveMode.PEEK_LOCK,\n )\n with receiver:\n for message in receiver:\n # 使用输入数据执行服务\n data = json.loads(str(message))\n ...\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此时我们已经准备好了模板worker,现在我们来创建容器的Dockerfile并将其推送到Azure容器注册表(Container Registry)。这里requirements.txt包含我们的worker的额外pip依赖项。通过exec运行主进程可以被视为一种确保它作为PID 1进程运行的技巧。这让集群能够在发生任何错误时自动重启pod,而无需在部署YAML文件中写入显式活动端点。请注意,指定健康检查仍然是更好的做法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"FROM pytorch\/pytorch:1.8.0-cuda11.1-cudnn8-runtime\nRUN python -m pip install -r requirements.txt\nCOPY . \/worker\nWORKDIR \/worker\nCMD exec python run.py\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"NAME= # SET\nVERSION= # SET\nACR= # SET\nsudo docker build -t $ACR.azurecr.io\/$NAME:$VERSION -f Dockerfile .\nsudo docker push $ACR.azurecr.io\/$NAME:$VERSION\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"创建K8s集群后,不要忘记从门户启用节点自动缩放特性(例如,最小1,最大8)。作为最后的准备步骤,我们需要在集群中启用"},{"type":"link","attrs":{"href":"https:\/\/docs.microsoft.com\/en-us\/azure\/aks\/gpu-cluster","title":"","type":null},"content":[{"type":"text","text":"GPU驱动程序"}]},{"type":"text","text":"(到gpu-resources命名空间)并通过官方YAML文件部署"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kedacore\/keda\/releases","title":"","type":null},"content":[{"type":"text","text":"KEDA服务"}]},{"type":"text","text":"(到keda命名空间)。为方便读者,Keda和GPU驱动程序YAML文件已包含在存储库中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"kubectl create ns gpu-resources\nkubectl apply -f nvidia-device-plugin-ds.yaml\nkubectl apply -f keda-2.3.0.yaml\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下一步,我们可以通过准备好的shell脚本部署worker容器。首先,我们创建命名空间来部署我们的服务:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"kubectl create namespace ml-system\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"请注意,使用shell文件而不是普通的YAML可以帮助我们轻松更改参数。运行部署脚本("},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving\/blob\/master\/deploy.sh","title":"","type":null},"content":[{"type":"text","text":"deploy.sh"}]},{"type":"text","text":")后,我们就准备就绪了(不要忘记根据你的需要设置参数):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"bash deploy.sh\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于我们限制每个pod使用一个GPU,因此通过KEDA缩放pod也将有效地缩放集群节点。这会让整体架构具有很高的成本效益。在某些情况下,你甚至可以将最小节点数设置为零并在worker空闲时砍掉GPU成本。但是,我们做这种配置时必须非常小心,并考虑好节点的缩放时间。部署脚本中使用的KEDA参数的细节可以在官方"},{"type":"link","attrs":{"href":"https:\/\/keda.sh\/docs\/2.3\/scalers\/azure-service-bus\/","title":"","type":null},"content":[{"type":"text","text":"文档"}]},{"type":"text","text":"中找到。在部署脚本("},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving\/blob\/master\/deploy.sh","title":"","type":null},"content":[{"type":"text","text":"deploy.sh"}]},{"type":"text","text":")中,如果你仔细查看,你会发现我们将NVIDIA_VISIBLE_DEVICES环境变量设置为“all”,来尝试从另一个容器(worker-1)访问GPU。这个技巧让我们能够同时利用集群缩放和一个pod中的多个容器。如果不设置此项,由于worker-0的“limit”约束,K8s将不允许为每个GPU添加更多容器。工程师应测量其模型的GPU内存使用情况,并根据GPU卡的限制添加容器。请注意,为简洁起见,图1中指定的Azure专属块的细节(示例服务总线接收器除外)没有展示出来。Azure为每个组件提供了大量文档以及相关的Python示例实现。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未来发展方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果大家看过了计算技术在20世纪的演变过程,那么现在深度学习硬件研究中发生的事情可能会让人感到非常熟悉。一个显然容易实现的成果是翻译器的小型化,我们可以尽量将计算核心的数量控制在单芯片的水平上。最后,行业严重依赖VLSI改进而不是算法开发。看到为深度学习定制的硬件增长的速度如此之快,我们可能会期望21世纪复刻20世纪的历史。另一方面,在云中,无服务器加速的DL服务似乎是可以轻松摘取的果实。深度学习部署将进一步抽象化,按用量付费的初创公司将在不久的将来随处可见。由于松散耦合架构带来的灵活性,我们也可以合理预测松散耦合架构将减轻在此类初创公司工作的工程师的负担,因此我们可能会看到许多针对松散耦合架构的新生开源项目。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在这篇文章中,我们描述了消息中介深度学习服务的四大好处:可控性、适应性、可观察性和自动缩放性(成本效益较高)。除此之外,我还提供了一个模板代码,可用于在Azure平台上部署文中所描述的架构。应该强调的是,这种服务的灵活性在某些场景中可能是不切实际的,例如物联网和嵌入式设备服务,在这些场景中组件的本地独立性过重。然而,这里提出的想法可以通过多种方式采用,例如我们可以使用低级C\/C+消息代理库在资源约束平台中创建类似的松散耦合架构,而不是使用云消息服务(例如用于自动驾驶、物联网需求)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"想在实践中尝试本文中的概念吗?你可以"},{"type":"link","attrs":{"href":"https:\/\/github.com\/elras\/Loosely-Coupled-ML-Serving","title":"","type":null},"content":[{"type":"text","text":"在Github上找到文章随附的代码"}]},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"作者介绍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Sabri Bolkar"},{"type":"text","text":"是机器学习应用科学家和工程师,他对基于学习的系统从研发到部署和持续改进的整个生命周期感兴趣。他在挪威科技大学学习计算机视觉,并在比利时鲁汶大学完成了关于无监督图像分割的硕士论文。在荷兰代尔夫特理工大学攻读博士学位后,他目前正在攻关电子商务行业面临的大规模应用深度学习的挑战。读者可以通过他的"},{"type":"link","attrs":{"href":"https:\/\/dunk.ai\/","title":"","type":null},"content":[{"type":"text","text":"网站"}]},{"type":"text","text":"与他联系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文链接:"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.com\/articles\/loosely-coupled-deep-learning-serving\/","title":"","type":null},"content":[{"type":"text","text":"Benefits of Loosely Coupled Deep Learning Serving"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章