監控OpenShift集羣
在OpenShift 4 的控制檯中已經在集成了一個Prometheus運行環境,這個內置的Prometheus環境主要是用來監控OpenShift集羣資源的,它主要監控以下組件。
- CoreDNS
- Elasticsearch
- etcd
- Fluentd
- HAProxy
- Image registry
- Kubelets
- Kubernetes apiserver
- Kubernetes controller manager
- Kubernetes scheduler
- Metering
- OpenShift apiserver
- OpenShift controller manager
- Operator Lifecycle Manager (OLM)
- Telemeter client
我們還可以在OpenShift控制檯的Monitoring-> Metrics中搜索監控指標並顯示監控數據(可點擊“Insert Example Query”按鈕,然後查看“sum(sort_desc(sum_over_time(ALERTS{alertstate=“firing”}[24h]))) by (alertname)”監控指標),還可通過下圖中的“Prometheus UI”進入其自己的訪問頁面。
還可執行以下命令獲取到Prometheus、Grafna、AlertManager和Thanos的訪問地址。
$ oc -n openshift-monitoring get routes
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
alertmanager-main alertmanager-main-openshift-monitoring.apps.cluster-beijing-0f11.beijing-0f11.example.opentlc.com alertmanager-main web reencrypt/Redirect None
grafana grafana-openshift-monitoring.apps.cluster-beijing-0f11.beijing-0f11.example.opentlc.com grafana https reencrypt/Redirect None
prometheus-k8s prometheus-k8s-openshift-monitoring.apps.cluster-beijing-0f11.beijing-0f11.example.opentlc.com prometheus-k8s web reencrypt/Redirect None
thanos-querier thanos-querier-openshift-monitoring.apps.cluster-beijing-0f11.beijing-0f11.example.opentlc.com thanos-querier web reencrypt/Redirect None
監控應用
在《OpenShift 4.3 之 Quarkus(3)用獨立的Prometheus監控Quarkus應用》一文中我們介紹瞭如何通過手動的方式在項目中創建Prometheus運行環境。在OpenShift 4.x中我們使用的是Operator來完成Prometheus環境的創建、管理和維護的相關操作(本文參考了https://medium.com/faun/using-the-operator-lifecycle-manager-to-deploy-prometheus-on-openshift-cd2f3abb3511)。
以下是本文Prometheus、ServiceMonitor和被監控Service的關係:
部署被監控的應用
- 新建運行應用的monitored-apps項目。
$ oc new-project monitored-apps
- 在本地創建包含以下內容的example-app.yaml文件
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
labels:
app: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: fabxc/instrumented_app
ports:
- name: web
containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: example-app
labels:
app: example-app
team: backend
spec:
selector:
app: example-app
ports:
- name: web
port: 8080
- 執行命令創建應用對象(Deployment、Serivce、Route),在Pod運行正常後查看應用的metrics監控指標。
$ oc create -f example-app.yaml
$ oc expose svc/example-app
$ oc get pod
$ curl http://$(oc get route example-app |awk 'NR==2 {print $2}')/metrics
通過Operator安裝並配置Prometheus
- 創建運行Prometheus的項目(圖中項目名爲“monitoring”,在這裏改爲“myprometheus”)。
$ oc new-project myprometheus
- 用集羣管理員訪問OpenShift的Administrator視圖,然後點擊左側Operators的Operatorhub菜單欄,此時確認當前項目是myprometheus。在Operatorhub頁面中找到Prometheus Operator後進入,隨後在右滑界面點擊Install,接受缺省的配置安裝即可。安裝成功後可在Installed Operator中顯示以下界面,進入上圖的Prometheus Operator。
- 在Overview中點擊Prometheus區域的Create Instance鏈接創建Prometheus環境。
- 在Create Prometheus頁面填入以下YAML配置,然後Create。
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-server
labels:
prometheus: k8s
namespace: myprometheus
spec:
replicas: 2
serviceAccountName: prometheus-k8s
securityContext: {}
serviceMonitorSelector:
matchExpressions:
- key: k8s-app
operator: Exists
ruleSelector:
matchLabels:
role: prometheus-rulefiles
prometheus: k8s
alerting:
alertmanagers:
- namespace: myprometheus
name: alertmanager-main
port: web
-
進入名爲prometheus-server的Prometheus配置的Resources頁面查看部署資源和狀態。
-
在Overview中點擊Service Monitor區域的Create Instance鏈接,然後在Create Service Monitor界面填入以下YAML創建Service Monitor對象。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: backend-monitor
labels:
k8s-app: backend-monitor
namespace: myprometheus
spec:
namespaceSelector:
any: true
selector:
matchLabels:
team: backend
endpoints:
- interval: 30s
port: web
- 由於Prometheus和被監控應用是運行在2個不用的項目,需要執行以下任意一行即可爲myprometheus項目賦於view權限。其中第一行由於使用了集羣角色,因此沒有指定目標項目;第二行賦予的只是項目級權限,因此指定了monitored-apps項目名。
oc adm policy add-cluster-role-to-user view system:serviceaccount:myprometheus:prometheus-k8s
oc adm policy add-role-to-user view system:serviceaccount:myprometheus:prometheus-k8s -n monitored-apps
- 執行命令,爲Prometheus生成Route。
$ oc expose svc/prometheus-operated -n myprometheus
$ oc get route -n myprometheus
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
prometheus-operated prometheus-operated-myprometheus.apps.cluster-beijing-511e.beijing-511e.example.opentlc.com prometheus-operated web None
- 用瀏覽器打開Prometheus的Route的地址。先點擊上方菜單中Status的Targets查看監控目標,由於example-app是運行在3個Pod中,所以監控目標目標爲3行。
- 點擊上方菜單的Graph,然後在Expression中填寫“codelab_api_http_requests_in_progress”,最後點擊Execute。此時可以切換到Graph視圖,界面將顯示所有和指標相關的監控圖形記錄。