準備服務器
- 阿里云云服務器
- 實例規格:輕量級 GPU 實例 ecs.vgn6i-m4-vws.xlarge(4vCPU 23GiB)
- 磁盤空間 :50G
- 操作系統:Ubuntu 22.04
安裝 docker
apt install docker.io
安裝 NVIDIA GRID 驅動
acs-plugin-manager --exec --plugin grid_driver_install
安裝 NVIDIA Container Toolkit
- 安裝命令
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit
- 配置命令
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
- 驗證是否安裝成功
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
下載 model checkpoint
- 創建下載腳本 download-model-checkpoint.py
from modelscope import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer
# Downloading model checkpoint to a local dir model_dir
model_dir = snapshot_download('qwen/Qwen-7B-Chat')
# Loading local checkpoints
# trust_remote_code is still set as True since we still load codes from local dir instead of transformers
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
device_map="auto",
trust_remote_code=True
).eval()
- 安裝腳本依賴包
pip install modelscope
pip install transformers
pip install torch
pip install tiktoken
pip install transformers_stream_generator
pip install accelerate
- 執行腳本下載 model checkpoints
python3 download-model-checkpoint.py
注:model checkpoints 文件會被下載到 ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat
文件夾中(這個路徑就是 model_dir 變量的值)。
啓動容器運行模型服務(OpenAI API 兼容方式)
- 簽出通義千問的開源代碼
git clone https://github.com/QwenLM/Qwen.git
- 使用下面的腳本啓動容器
IMAGE_NAME=qwenllm/qwen:cu114
PORT=8901
CHECKPOINT_PATH=~/.cache/modelscope/hub/qwen/Qwen-7B-Chat
bash docker/docker_openai_api.sh -i ${IMAGE_NAME} -c ${CHECKPOINT_PATH} --port ${PORT}
注:qwenllm/qwen:cu114 鏡像文件大小爲 9.87G
- 確認容器是否啓動成功
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2bd3f3417af qwenllm/qwen:cu114 "/opt/nvidia/nvidia_…" 3 minutes ago Up 3 minutes 0.0.0.0:8901->80/tcp, :::8901->80/tcp qwen
啓動成功!
- 確認 api 是否可以正常請求
# curl localhost:8901/v1/models | jq
輸出內容
{
"object": "list",
"data": [
{
"id": "gpt-3.5-turbo",
"object": "model",
"created": 1707471911,
"owned_by": "owner",
"root": null,
"parent": null,
"permission": null
}
]
}
請求成功!可以正常兼容 openai 的 api。