kfserving 是 kubeflow 一個用於構建部署標準化的算法模型 serverless 組件,但其和 knative 深度綁定,對傳輸鏈路進行了隱藏,如封裝istio,這樣複雜的結構不利於生產環境直接使用,這裏通過 kubevela 實現的 OAM 將 serverless 流程重新進行簡單的標準化封裝,以實現一個簡單的算法模型 serverless。
背景
如何爲算法團隊提供高效的工程化上雲支持是雲原生時代一個很重要的也很有意義的課題,現在開源社區比較完善的應該是 Kubeflow —— 一系列 ML 實驗部署環境工具的集合,不過整體來看比較笨重,不適合小團隊生產環境快速落地,這裏基於 kubevela 和 kfserving 實現一個算法標準化模型的例子,供參考。
項目介紹
項目地址:https://github.com/shikanon/vela-example/tree/main/example/sklearnserver
通過 kubevela 提供了三種對象 mpserver, hpa, httproute。
- mpserver 主要負責生成 deployment 和 service 資源,是程序運行的主體
- httroute 主要負責生成對外暴露的端口,訪問 url
- hpa 主要保證服務的可擴展性
部署前準備工作
由於使用到vela
,所以需要先下載vela
客戶端
創建一個 sklearn 的服務
案例放在 exmaple/sklearnserver
下面。
- 本地鏡像編譯並運行:
# 編譯
docker build -t swr.cn-north-4.myhuaweicloud.com/hw-zt-k8s-images/sklearnserver:demo-iris -f sklearn.Dockerfile .
- 上傳到華爲雲鏡像倉庫
docker login swr.cn-north-4.myhuaweicloud.com
docker push swr.cn-north-4.myhuaweicloud.com/hw-zt-k8s-images/sklearnserver:demo-iris
- 創建一個
demo-iris-01.yaml
的應用文件
name: demo-iris-01
services:
demo-iris:
type: mpserver
image: swr.cn-north-4.myhuaweicloud.com/hw-zt-k8s-images/sklearnserver:demo-iris
ports: [8080]
cpu: "200m"
memory: "250Mi"
httproute:
gateways: ["external-gateway"]
hosts: ["demo-iris.rcmd.testing.mpengine"]
servernamespace: rcmd
serverport: 8080
hpa:
min: 1
max: 1
cpuPercent: 60
因爲這裏使用的是rcmd
命名空間,在創建的時候需要切換,可以通過vela dashboard 通過可視化界面創建一個 rcmd
命名空間的環境:
vela dashboard
成功後可以通過vela env
查看:
$ vela env ls
NAME CURRENT NAMESPACE EMAIL DOMAIN
default default
rcmd * rcmd
- 在雲原生環境運行應用
$ vela up -f demo-iris-01.yaml
Parsing vela appfile ...
Load Template ...
Rendering configs for service (demo-iris)...
Writing deploy config to (.vela/deploy.yaml)
Applying application ...
Checking if app has been deployed...
App has not been deployed, creating a new deployment...
✅ App has been deployed 🚀🚀🚀
Port forward: vela port-forward demo-iris-01
SSH: vela exec demo-iris-01
Logging: vela logs demo-iris-01
App status: vela status demo-iris-01
Service status: vela status demo-iris-01 --svc demo-iris
測試
部署好後可以測試:
$ curl -i -d '{"instances":[[5.1, 3.5, 1.4, 0.2]]}' -H "Content-Type: application/json" -X POST demo-iris.rcmd.testing.mpengine:8000/v1/models/model:predict
{"predictions": [0]}
實現說明
kfserver 開發算法 server
kfserver 提供了多種常用框架的 server,比如 sklearn, lgb, xgb, pytorch 等多種服務的 server 框架, kfserver 基於 tornado 框架進行開發,其提供了 模型加載,接口健康檢測,預測及 參考解釋等多個抽象接口,詳細見kfserving/kfserving/kfserver.py
:
...
def create_application(self):
return tornado.web.Application([
# Server Liveness API returns 200 if server is alive.
(r"/", LivenessHandler),
(r"/v2/health/live", LivenessHandler),
(r"/v1/models",
ListHandler, dict(models=self.registered_models)),
(r"/v2/models",
ListHandler, dict(models=self.registered_models)),
# Model Health API returns 200 if model is ready to serve.
(r"/v1/models/([a-zA-Z0-9_-]+)",
HealthHandler, dict(models=self.registered_models)),
(r"/v2/models/([a-zA-Z0-9_-]+)/status",
HealthHandler, dict(models=self.registered_models)),
(r"/v1/models/([a-zA-Z0-9_-]+):predict",
PredictHandler, dict(models=self.registered_models)),
(r"/v2/models/([a-zA-Z0-9_-]+)/infer",
PredictHandler, dict(models=self.registered_models)),
(r"/v1/models/([a-zA-Z0-9_-]+):explain",
ExplainHandler, dict(models=self.registered_models)),
(r"/v2/models/([a-zA-Z0-9_-]+)/explain",
ExplainHandler, dict(models=self.registered_models)),
(r"/v2/repository/models/([a-zA-Z0-9_-]+)/load",
LoadHandler, dict(models=self.registered_models)),
(r"/v2/repository/models/([a-zA-Z0-9_-]+)/unload",
UnloadHandler, dict(models=self.registered_models)),
])
...
這裏我們使用的 sklearn server 的案例主要實現了 predict
接口:
import kfserving
import joblib
import numpy as np
import os
from typing import Dict
MODEL_BASENAME = "model"
MODEL_EXTENSIONS = [".joblib", ".pkl", ".pickle"]
class SKLearnModel(kfserving.KFModel): # pylint:disable=c-extension-no-member
def __init__(self, name: str, model_dir: str):
super().__init__(name)
self.name = name
self.model_dir = model_dir
self.ready = False
def load(self) -> bool:
model_path = kfserving.Storage.download(self.model_dir)
paths = [os.path.join(model_path, MODEL_BASENAME + model_extension)
for model_extension in MODEL_EXTENSIONS]
for path in paths:
if os.path.exists(path):
self._model = joblib.load(path)
self.ready = True
break
return self.ready
def predict(self, request: Dict) -> Dict:
instances = request["instances"]
try:
inputs = np.array(instances)
except Exception as e:
raise Exception(
"Failed to initialize NumPy array from inputs: %s, %s" % (e, instances))
try:
result = self._model.predict(inputs).tolist()
return {"predictions": result}
except Exception as e:
raise Exception("Failed to predict %s" % e)