假期充電，用阿里雲 Serverless K8s + AIGC 搭建私人代碼助理

AI 技術正在引領科技創新浪潮，隨着 ChatGPT 和 Midjourney 的走紅，AIGC 技術正在世界範圍內掀起一股 AI 技術浪潮。開源領域也湧現了許多類似模型，如 FastGPT、Moss、Stable Diffusion 等。這些模型展現出的驚人效果吸引企業和開發者們投身其中，但是複雜繁瑣的部署方式成爲了攔路虎。阿里雲 ASK 提供 Serverless 化的容器服務，用戶無需關心資源及環境配置，可以幫助開發者們零門檻快速部署 AI 模型。本文以開源的 FastChat 爲例，詳細展示如何在 ASK 中快速搭建一個私人代碼助理。

效果預覽

Cursor + GPT-4 的代碼生成是不是覺得很智能，我們通過 FastChat + VSCode 插件也能做到一樣的效果！

快速生成一個 Golang Hello World

地址：https://intranetproxy.alipay.com/skylark/lark/0/2023/gif/11431/1682574183392-11e16131-3dae-4969-a0d1-79a0a9eefb01.gif

快速生成一個 Kubernetes Deployment

地址：https://intranetproxy.alipay.com/skylark/lark/0/2023/gif/11431/1682574192825-7a1d3c76-025d-45db-bea1-4ca5dd885520.gif

背景介紹

ASK(Alibaba Serverless Kubernetes)是阿里雲容器服務團隊提供的一款面向 Serverless 場景的容器產品。用戶可以使用 Kubernetes API 直接創建 Workload，免去節點運維煩惱。ASK 作爲容器 Serverless 平臺，具有免運維、彈性擴容、兼容 K8s 社區、強隔離四大特性。

大規模 AI 應用訓練和部署主要面臨以下挑戰。

GPU 資源受限且訓練成本較高

大規模 AI 應用在訓練及推理時都需要使用 GPU，但是很多開發者缺少 GPU 資源。單獨購買 GPU 卡，或者購買 ECS 實例都需要較高成本。

資源異構

並行訓練時需要大量的 GPU 資源，這些 GPU 往往是不同系列的。不同 GPU 支持的 CUDA 版本不同，且跟內核版本、nvidia-container-cli 版本相互綁定，開發者需要關注底層資源，爲 AI 應用開發增加了許多難度。

鏡像加載慢

AI 類應用鏡像經常有幾十 GB，下載往往需要幾十分鐘甚至數小時。

針對上述問題，ASK 提供了完美的解決方案。在 ASK 中可以通過 Kubernetes Workload 十分方便的使用 GPU 資源，無需其前置準備使用，用完即可立即釋放，使用成本低。ASK 屏蔽了底層資源，用戶無需關心 GPU、CUDA 版本等等的依賴問題，只需關心 AI 應用的自身邏輯即可。同時，ASK 默認就提供了鏡像緩存能力，當 Pod 第 2 次創建時可以秒級啓動。

部署流程

1. 前提條件

已創建 ASK 集羣。具體操作，請參見創建 ASK 集羣[1]。
下載 llama-7b 模型並上傳到 OSS 。具體操作，請參見本文附錄部分。

2. 使用 Kubectl 創建

替換 yaml 文件中變量

${your-ak} 您的 AK

${your-sk} 您的 SK

${oss-endpoint-url} OSS 的 enpoint

${llama-oss-path} 替換爲存放 llama-7b 模型的地址（路徑末尾不需要/），如 oss://xxxx/llama-7b-hf

apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
type: Opaque
stringData:
  .ossutilconfig: |
    [Credentials]
    language=ch
    accessKeyID=${your-ak}
    accessKeySecret=${your-sk}
    endpoint=${oss-endpoint-url}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fastchat
  name: fastchat
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fastchat
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 100%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: fastchat
        alibabacloud.com/eci: "true" 
      annotations:
        k8s.aliyun.com/eci-use-specs: ecs.gn6e-c12g1.3xlarge
    spec:
      volumes:
      - name: data
        emptyDir: {}
      - name: oss-volume
        secret:
          secretName: oss-secret
      dnsPolicy: Default
      initContainers:
      - name: llama-7b
        image: yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/ossutil:v1
        volumeMounts:
          - name: data
            mountPath: /data
          - name: oss-volume
            mountPath: /root/
            readOnly: true
        command: 
        - sh
        - -c
        - ossutil cp -r ${llama-oss-path} /data/
        resources:
          limits:
            ephemeral-storage: 50Gi
      containers:
      - command:
        - sh
        - -c 
        - "/root/webui.sh"
        image: yunqi-registry.cn-shanghai.cr.aliyuncs.com/lab/fastchat:v1.0.0
        imagePullPolicy: IfNotPresent
        name: fastchat
        ports:
        - containerPort: 7860
          protocol: TCP
        - containerPort: 8000
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 7860
          timeoutSeconds: 1
        resources:
          requests:
            cpu: "4"
            memory: 8Gi
          limits:
            nvidia.com/gpu: 1
            ephemeral-storage: 100Gi
        volumeMounts:
        - mountPath: /data
          name: data
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: internet
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-instance-charge-type: PayByCLCU
  name: fastchat
  namespace: default
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 7860
    protocol: TCP
    targetPort: 7860
    name: web
  - port: 8000
    protocol: TCP
    targetPort: 8000
    name: api
  selector:
    app: fastchat
  type: LoadBalancer

3. 等待 FastChat Ready

等待 pod ready 後，在瀏覽器中訪問 http://${externa-ip}:7860

啓動後需要下載 vicuna-7b 模型，模型大小約 13GB

下載模型時間大概耗時約 20 分鐘左右，如果提前做好磁盤快照，通過磁盤快照創建磁盤並掛載到 pod，就是秒級生效

kubectl get po |grep fastchat

# NAME                        READY   STATUS    RESTARTS   AGE
# fastchat-69ff78cf46-tpbvp   1/1     Running   0          20m

kubectl get svc fastchat
# NAME       TYPE           CLUSTER-IP        EXTERNAL-IP    PORT(S)          AGE
# fastchat   LoadBalancer   192.168.230.108   xxx.xx.x.xxx   7860:31444/TCP   22m

效果展示

Case 1：通過控制檯使用 FastChat

在瀏覽器中訪問 http://${externa-ip}:7860，可以直接測試聊天功能。比如使用自然語言讓 FastChat 寫一段代碼。

輸入：基於 Nginx 鏡像編寫 Kubernetes Deployment Yaml 文件

FastChat 輸出如下圖所示。

Case 2：通過 API 使用 FastChat

FastChat API 監聽在 8000 端口，如下所示，通過 curl 發起一個 API 調用，然後返回結果。

curl 命令

kubectl get po |grep fastchat

# NAME                        READY   STATUS    RESTARTS   AGE
# fastchat-69ff78cf46-tpbvp   1/1     Running   0          20m

kubectl get svc fastchat
# NAME       TYPE           CLUSTER-IP        EXTERNAL-IP    PORT(S)          AGE
# fastchat   LoadBalancer   192.168.230.108   xxx.xx.x.xxx   7860:31444/TCP   22m

輸出結果

{"id":"3xqtJcXSLnBomSWocuLW2b","object":"chat.completion","created":1682574393,"choices":[{"index":0,"message":{"role":"assistant","content":"下面是使用 Go 語言生成 \"Hello, World!\" 的代碼：\n```go\npackage main\n\nimport \"fmt\"\n\nfunc main() {\n    fmt.Println(\"Hello, World!\")\n}\n```\n運行該代碼後，會輸出 \"Hello, World!\"。"},"finish_reason":"stop"}],"usage":null}

Case 3: VSCode 插件

既然有了 API 接口，在 IDE 中怎麼快速集成這個能力呢。你是不是想到了 Copilot、Cursor、Tabnine ，那咱們就通過 VSCode 插件集成一下 FastChat 看看吧。VSCode 插件幾個核心文件：src/extension.ts、package.json 和 tsconfig.json

這三個文件的內容分別如下：

src/extension.ts

import * as vscode from 'vscode';
import axios from 'axios';

import { ExtensionContext, commands, window } from "vscode";
const editor = window.activeTextEditor
export function activate(context: vscode.ExtensionContext) {
    let fastchat = async () => {
        vscode.window.showInputBox({ prompt: '請輸入代碼提示語' }).then((inputValue) => {
            if (!inputValue) {
                return;
            }

            vscode.window.withProgress({
                location: vscode.ProgressLocation.Notification,
                title: '正在請求...',
                cancellable: false
            }, (progress, token) => {
                return axios.post('http://example.com:8000/v1/chat/completions', {
                    model: 'vicuna-7b-v1.1',
                    messages: [{ role: 'user', content: inputValue }]
                }, {
                    headers: {
                        'Content-Type': 'application/json'
                    }
                }).then((response) => {
                    // const content = JSON.stringify(response.data);
                    const content = response.data.choices[0].message.content;
                    console.log(response.data)
                    const regex = /```.*\n([\s\S]*?)```/
                    const matches = content.match(regex)
                    if (matches && matches.length > 1) {
                        editor?.edit(editBuilder => {
                            let position = editor.selection.active;
                            position && editBuilder.insert(position, matches[1].trim())
                        })
                    }
                }).catch((error) => {
                    console.log(error);
                });
            });
        });

    }
    let command = commands.registerCommand(
        "fastchat",
        fastchat
    )
    context.subscriptions.push(command)
}

package.json

{
    "name": "fastchat",
    "version": "1.0.0",
    "publisher": "yourname",
    "engines": {
        "vscode": "^1.0.0"
    },
    "categories": [
        "Other"
    ],
    "activationEvents": [
        "onCommand:fastchat"
    ],
    "main": "./dist/extension.js",
    "contributes": {
        "commands": [
            {
                "command": "fastchat",
                "title": "fastchat code generator"
            }
        ]
    },
    "devDependencies": {
        "@types/node": "^18.16.1",
        "@types/vscode": "^1.77.0",
        "axios": "^1.3.6",
        "typescript": "^5.0.4"
    }
}

tsconfig.json

{
    "compilerOptions": {
      "target": "ES2018",
      "module": "commonjs",
      "outDir": "./dist",
      "strict": true,
      "esModuleInterop": true,
      "resolveJsonModule": true,
      "declaration": true
    },
    "include": ["src/**/*"],
    "exclude": ["node_modules", "**/*.test.ts"]
  }

好，插件開發完咱們就看一下效果。

快速生成一個 Golang Hello World

地址：https://intranetproxy.alipay.com/skylark/lark/0/2023/gif/11431/1682574183392-11e16131-3dae-4969-a0d1-79a0a9eefb01.gif

快速生成一個 Kubernetes Deployment

地址：https://intranetproxy.alipay.com/skylark/lark/0/2023/gif/11431/1682574192825-7a1d3c76-025d-45db-bea1-4ca5dd885520.gif

總結

ASK 作爲容器 Serverless 平臺，具有免運維、彈性擴縮容、屏蔽異構資源、鏡像加速等能力，非常適合 AI 大模型部署場景，歡迎試用。

附錄：

1. 下載 llama-7b 模型

模型地址：https://huggingface.co/decapoda-research/llama-7b-hf/tree/main

# 如果使用的是阿里雲 ECS，需要運行如下命令安裝 git-lfs
# yum install git-lfs

git clone https://huggingface.co/decapoda-research/llama-7b-hf
git lfs install
git lfs pull

2. 上傳到 OSS

可參考文檔：https://help.aliyun.com/document_detail/195960.html

參考文檔：

[1] 創建 ASK 集羣

https://help.aliyun.com/document_detail/86377.htm?spm=a2c4g.186945.0.0.61eb3e0694K2ej#task-e3c-311-ydb

[2] ASK 概述

https://help.aliyun.com/document_detail/86366.html?spm=a2c4g.750001.0.i1

作者：子白、冬島

原文鏈接

本文爲阿里雲原創內容，未經允許不得轉載。

假期充電，用阿里雲 Serverless K8s + AIGC 搭建私人代碼助理

效果預覽

背景介紹

部署流程

1. 前提條件

2. 使用 Kubectl 創建

3. 等待 FastChat Ready

效果展示

Case 1：通過控制檯使用 FastChat

Case 2：通過 API 使用 FastChat

Case 3: VSCode 插件

總結

附錄：

參考文檔：

vxe-table的合併行以及同一行的合併列

軟件測試從自動化到智能化，大模型開始加入

數據結構筆記淺記（十二）雙向隊列

數據結構筆記淺記（十一）單向隊列

數據結構筆記淺記（八）列表

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結