GitHub Actions 很不錯,相比較 Travis CI 而言排隊不是很嚴重,除了用於 CI/CD 以外還可以通過提取內部的 DockerHub Credential 放到本地用於 docker pull 來避開 Docker Hub 的 429 Ratelimit 問題(參考:「 同步 docker hub library 鏡像到本地 registry [1]」),對於一些小項目而言,GitHub Actions 提供的 Standard_DS2_v2 虛擬機確實性能還行,但是如果對於以下需求,使用 GitHub Actions 自帶的機器可能就不是很合適了:
-
編譯 TiKV(Standard_DS2_v2 的 2C7G 的機器用 build dist_release
可以編譯到死(或者 OOM)) -
需要一些內部鏡像協作,或使用到內網資源 -
私有倉庫,且需要大量編譯(官方的 Action 對於私有倉庫只有 2000 分鐘的使用時間) -
需要更大的存儲空間(官方的 GitHub Actions 只有 15G 不到的可用空間)
這種時候,我們就需要使用 Self-hosted Runner,什麼是 Self-hosted Runner?
Self-hosted runners offer more control of hardware, operating system, and software tools than GitHub-hosted runners provide. With self-hosted runners, you can choose to create a custom hardware configuration with more processing power or memory to run larger jobs, install software available on your local network, and choose an operating system not offered by GitHub-hosted runners. Self-hosted runners can be physical, virtual, in a container, on-premises, or in a cloud.
對於一個 Org 而言,要添加一個 Org Level (全 Org 共享的) Runner 比較簡單,只需要:
$ mkdir actions-runner && cd actions-runner
$ curl -o actions-runner-linux-x64-2.278.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.278.0/actions-runner-linux-x64-2.278.0.tar.gz
$ ./config.sh --url https://github.com/some-github-org --token AF5TxxxxxxxxxxxA6PRRS
$ ./run.sh
你就可以獲得一個 Self hosted Runner 了,但是這樣做會有一些侷限性,比如:
-
沒法彈性擴容,只能一個個手動部署 -
直接部署在裸機上,會有環境不一致的問題
Runner in Containter
Simple Docker
爲了解決這個問題,我們需要把 GitHub Runner 給容器化,這裏提供一個 Dockerfile 的 Example (魔改自:https://github.com/SanderKnape/github-runner),由於需要使用到類似 dind 的環境(在 Actions 中直接使用到 Docker 相關的指令),所以我加入了 docker 的 binary 進去,由於默認 Runner 不允許以 root 權限運行,爲了避開後續掛載宿主機 Docker 的 sock 導致的權限問題,使用的 GitHub Runner 是一個經過修改的版本,修改版本中讓 Runner 可以以 root 權限運行,修改的腳本如下:
$ wget https://github.com/actions/runner/releases/download/v2.278.0/actions-runner-linux-x64-2.278.0.tar.gz
$ tar xzf ./actions-runner-linux-x64-2.278.0.tar.gz && rm -f actions-runner-linux-x64-2.278.0.tar.gz
# 這裏刪除了兩個文件中判斷是否 root 用戶的部分
$ sed -i '3,9d' ./config.sh
$ sed -i '3,8d' ./run.sh
# End
# 重新打包
$ tar -czf actions-runner-linux-x64-2.278.0.tar.gz *
# 刪除解壓出來的不需要的文件
$ rm -rf bin config.sh env.sh externals run.sh
然後 Dockerfile 可以這麼寫
FROM ubuntu:18.04
ENV GITHUB_PAT ""
ENV GITHUB_ORG_NAME ""
ENV RUNNER_WORKDIR "_work"
ENV RUNNER_LABELS ""
RUN apt-get update \
&& apt-get install -y curl sudo git jq iputils-ping zip \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& curl https://download.docker.com/linux/static/stable/x86_64/docker-20.10.7.tgz --output docker-20.10.7.tgz \
&& tar xvfz docker-20.10.7.tgz \
&& cp docker/* /usr/bin/
USER root
WORKDIR /root/
RUN GITHUB_RUNNER_VERSION="2.278.0" \
&& curl -Ls https://internal.knat.network/action-runner/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz \
&& ./bin/installdependencies.sh
COPY entrypoint.sh runsvc.sh ./
RUN sudo chmod u+x ./entrypoint.sh ./runsvc.sh
ENTRYPOINT ["./entrypoint.sh"]
其中 entrypoint.sh
的內容如下:
#!/bin/sh
# 這裏如果直接使用 ./config.sh --url https://github.com/some-github-org --token AF5TxxxxxxxxxxxA6PRRS 的方式註冊的話,token 會動態變化,容易導致註冊後無法 remove 的問題,所以參考 https://docs.github.com/en/rest/reference/actions#list-self-hosted-runners-for-an-organization 通過 Personal Access Token 動態獲取 Runner 的 Token
registration_url="https://github.com/${GITHUB_ORG_NAME}"
token_url="https://api.github.com/orgs/${GITHUB_ORG_NAME}/actions/runners/registration-token"
payload=$(curl -sX POST -H "Authorization: token ${GITHUB_PAT}" ${token_url})
export RUNNER_TOKEN=$(echo $payload | jq .token --raw-output)
if [ -z "${RUNNER_NAME}" ]; then
RUNNER_NAME=$(hostname)
fi
./config.sh --unattended --url https://github.com/${GITHUB_ORG_NAME} --token ${RUNNER_TOKEN} --labels "${RUNNER_LABELS}"
# 在容器被幹掉的時候自動向 GitHub 解除註冊 Runner
remove() {
if [ -n "${GITHUB_RUNNER_TOKEN}" ]; then
export REMOVE_TOKEN=$GITHUB_RUNNER_TOKEN
else
payload=$(curl -sX POST -H "Authorization: token ${GITHUB_PAT}" ${token_url%/registration-token}/remove-token)
export REMOVE_TOKEN=$(echo $payload | jq .token --raw-output)
fi
./config.sh remove --unattended --token "${RUNNER_TOKEN}"
}
trap 'remove; exit 130' INT
trap 'remove; exit 143' TERM
./runsvc.sh "$*" &
wait $!
Build + 運行:
$ docker build . -t n0vad3v/github-runner
$ docker run -v /var/run/docker.sock:/var/run/docker.sock -e GITHUB_PAT="ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT" -e GITHUB_ORG_NAME="some-github-org" -it n0vad3v/github-runner
此時你就可以看到你的 Org 下多了一個船新的 Runner 了,現在終於可以利用上自己的機器快速跑任務不排隊,而且性能比 GitHub Actions 強了~
Scale with Kubernetes
但是這樣並不 Scale,所有的 Runner 都需要手動管理,而且,GitHub Actions 如果同時寫了多個 Job ,然後 Runner 數量小於 Job 數量的話,部分 Job 就會一直排隊,對於排隊時間的話:
Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete.
那這個肯定是沒法接受的,正好手邊有個 k8s 集羣,對於這類基本無狀態的服務來說,讓 k8s 來自動管理他們不是最好的嘛,於是可以想到寫一個 Deployment,比如這樣:
apiVersion: apps/v1
kind: Deployment
metadata:
name: github-runner-some-github-org
labels:
app: githubrunner
spec:
replicas: 10
selector:
matchLabels:
app: githubrunner
template:
metadata:
labels:
app: githubrunner
spec:
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
type: File
containers:
- name: github-runner-some-github-org
imagePullPolicy: Always
image: 'n0vad3v/github-runner'
env:
- name: GITHUB_PAT
value: "ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT"
- name: GITHUB_ORG_NAME
value: "some-github-org"
- name: RUNNER_LABELS
value: "docker,internal-k8s"
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
readOnly: false
kubectl apply -f action.yml -n novakwok
,打上 Tag, 起飛!
[root@dev action]# kubectl get po -n novakwok
NAME READY STATUS RESTARTS AGE
github-runner-some-github-org-deployment-9cfb598d9-4shrk 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-5rnj4 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-cvkr9 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-dmbnp 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-ggl24 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-gkgzx 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-jcscq 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-lrrxh 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-pn9cn 1/1 Running 0 26m
github-runner-some-github-org-deployment-9cfb598d9-wj2tj 1/1 Running 0 26m
Demo on Docker
由於我的需求比較特殊,我需要在 Runner 內使用 Docker 相關的指令(比如需要在 Runner 上 docker build/push
),這裏測試一下 Runner 是否可以正常工作,首先創建一個多 Job 的任務,像這樣:
name: Test
on:
push:
branches: [ main ]
jobs:
test-1:
runs-on: [self-hosted,X64]
steps:
- uses: actions/checkout@v2
- name: Run a one-line script
run: |
curl ip.sb
df -h
lscpu
docker pull redis
test-2:
runs-on: [self-hosted,X64]
steps:
- uses: actions/checkout@v2
- name: Run a one-line script
run: |
curl ip.sb
df -h
lscpu
docker pull redis
test-3:
runs-on: [self-hosted,X64]
steps:
- uses: actions/checkout@v2
- name: Run a one-line script
run: |
curl ip.sb
df -h
lscpu
pwd
docker pull redis
然後跑一下看看是否可以 Work,首先確定是調度到了 Docker Runner 上:
然後看看 Docker 相關的操作是否可以 Work
好耶!
GC
有的時候會由於一些詭異的問題導致 Runner 掉線(比如 Remove 的時候網絡斷了之類的),這種之後 Org 下就會有一堆 Offline 的 Runner,爲了解決這種情況,我們可以寫一個簡單的腳本來進行 GC,腳本如下:
import requests
import argparse
parser = argparse.ArgumentParser(description='GC Dead Self-hosted runners')
parser.add_argument('--github_pat', help='GitHub Personal Access Token')
parser.add_argument('--org_name', help='GitHub Org Name')
args = parser.parse_args()
def list_runners(org_name,github_pat):
list_runner_url = 'https://api.github.com/orgs/{}/actions/runners'.format(org_name)
headers = {"Authorization": "token {}".format(github_pat)}
r = requests.get(list_runner_url,headers=headers)
runner_list = r.json()['runners']
return runner_list
def delete_offline_runners(org_name,github_pat,runner_list):
headers = {"Authorization": "token {}".format(github_pat)}
for runner in runner_list:
if runner['status'] == "offline":
runner_id = runner['id']
delete_runner_url = 'https://api.github.com/orgs/{}/actions/runners/{}'.format(org_name,runner_id)
print("Deleting runner " + str(runner_id) + ", with name of " + runner['name'])
r = requests.delete(delete_runner_url,headers=headers)
if __name__ == '__main__':
runner_list = list_runners(args.org_name,args.github_pat)
delete_offline_runners(args.org_name,args.github_pat,runner_list)
用法是:python3 gc_runner.py --github_pat "ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT" --org_name "some-github-org"
Some limitations
除了我們自身硬件限制以外,GitHub Actions 本身還有一些限制,比如:
-
Workflow run time - Each workflow run is limited to 72 hours. If a workflow run reaches this limit, the workflow run is cancelled. -
Job queue time - Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete. -
API requests - You can execute up to 1000 API requests in an hour across all actions within a repository. If exceeded, additional API calls will fail, which might cause jobs to fail. -
Job matrix - A job matrix can generate a maximum of 256 jobs per workflow run. This limit also applies to self-hosted runners. -
Workflow run queue - No more than 100 workflow runs can be queued in a 10 second interval per repository. If a workflow run reaches this limit, the workflow run is terminated and fails to complete.
其中 API requests 這個比較玄學,由於 GitHub Actions 的工作方法官方介紹如下:
The self-hosted runner polls GitHub to retrieve application updates and to check if any jobs are queued for processing. The self-hosted runner uses a HTTPS long poll that opens a connection to GitHub for 50 seconds, and if no response is received, it then times out and creates a new long poll.
所以不是很容易判斷怎麼樣纔算是一個 API request,這一點需要在大量使用的時候纔可能暴露出問題。
Git Version
這裏有個小坑,容器內的 Git 版本建議在 2.18 以上,Ubuntu 18.04 沒問題(默認是 2.22.5),但是 arm64v8/ubuntu:18.04
官方源包管理工具的 Git 版本是 2.17,如果用這個版本的話,會遇到這種問題:
所以需要編譯一個高版本的 Git,比如 Dockerfile 可以加上這麼一行:
$ apt install -y gcc libssl-dev libcurl4-gnutls-dev zlib1g-dev make gettext wget
$ wget https://www.kernel.org/pub/software/scm/git/git-2.28.0.tar.gz && tar -xvzf git-2.28.0.tar.gz && cd git-2.28.0 && ./configure --prefix=/usr/ && make && make install
小結
如上,我們已經把 Runner 封進了 Docker 容器中,並且在需要 Scale 的情況下通過 k8s 進行水平擴展,此外,我們還有一個簡單的 GC 程序對可能異常掉線的 Runner 進行 GC,看上去已經滿足了一些初步的需求啦~
但是這樣還是有一些問題,比如:
-
用 root 用戶跑容器可能會有潛在的風險,尤其是還暴露了宿主機的 Docker sock,所以對於普通的任務來說,還是需要一個非 root 用戶的容器來運行 -
還是沒有實現自動化擴縮容,擴縮容依賴手動修改 replica,這裏需要進行自動化(例如預留 20 個 Idle 的 Runner,如果 Idle Runner 小於 20 個就自動增加) -
Label 管理,由於 GitHub Actions 依賴的 Label 進行調度,所以這裏打 Label 其實是一個需要長期考慮的事情
References
-
Running self-hosted GitHub Actions runners in your Kubernetes cluster [2] -
About GitHub-hosted runners [3] -
Actions [4]
腳註
同步 docker hub library 鏡像到本地 registry : https://blog.k8s.li/sync-dockerhub-library-images.html
[2]Running self-hosted GitHub Actions runners in your Kubernetes cluster: https://sanderknape.com/2020/03/self-hosted-github-actions-runner-kubernetes/
[3]About GitHub-hosted runners: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners
[4]Actions: https://docs.github.com/en/rest/reference/actions#list-self-hosted-runners-for-an-organization
原文鏈接:https://nova.moe/run-self-hosted-github-action-runners-on-kubernetes/
你可能還喜歡
點擊下方圖片即可閱讀
雲原生是一種信仰 🤘
關注公衆號
後臺回覆◉k8s◉獲取史上最方便快捷的 Kubernetes 高可用部署工具,只需一條命令,連 ssh 都不需要!
點擊 "閱讀原文" 獲取更好的閱讀體驗!
發現朋友圈變“安靜”了嗎?
本文分享自微信公衆號 - 雲原生實驗室(cloud_native_yang)。
如有侵權,請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”,歡迎正在閱讀的你也加入,一起分享。