當 dotnet-monitor 遇上 Prometheus, 是種什麼樣的體驗？

對於開發和運維人員來說, 監控大屏很棒, 讓我們來做一個 Dashboard 吧！大家可能聽說過一些 CLI 診斷工具，比如 dotnet-counters，dotnet-dump 和 dotnet-trace, 那 dotnet-monitor 又是什麼呢？簡單理解就是把上面的幾種診斷工具進行了包裝, 並且暴露了對應的 REST API, 讓診斷變的更容易, 在去年, dotnet-monitor 還是一個實驗性的診斷工具, 當時我也寫了一篇文章介紹 dotnet-monitor, 使用 dotnet-monitor 分析.NET 應用程序, 而最近, .NET 團隊宣佈第一個 release 版本的 dotnet-monitor, 同時它也是 .NET 6 的內容 😍, 也就是 dotnet-monitor 6.0.0 !

今天我們關注的主要是指標, 也就是 /metrics 端點, 正如文檔所說，它是按照 Prometheus 的格式顯示當前dotnet程序的 metrics 快照, 順便說一句，這是非常簡單的格式，如下:

# HELP systemruntime_cpu_usage_ratio CPU Usage
# TYPE systemruntime_cpu_usage_ratio gauge
systemruntime_cpu_usage_ratio 0 1632929076109
systemruntime_cpu_usage_ratio 0 1632929076111
systemruntime_cpu_usage_ratio 0 1632929086110
# HELP systemruntime_working_set_bytes Working Set
# TYPE systemruntime_working_set_bytes gauge
systemruntime_working_set_bytes 1529000000 1632929066112
systemruntime_working_set_bytes 1529000000 1632929076110
systemruntime_working_set_bytes 1529000000 1632929076112
...
# HELP systemruntime_time_in_gc_ratio % Time in GC since last GC
# TYPE systemruntime_time_in_gc_ratio gauge
systemruntime_time_in_gc_ratio 0 1632929066112
systemruntime_time_in_gc_ratio 0 1632929076110
systemruntime_time_in_gc_ratio 0 1632929076112

上面看到的是 System.Runtime counters 的指標, 接下來，我們需要把這些信息展示到 Grafana Dashboard。

1. 準備一個 dotnet core 應用

爲了更好的展示, 這裏我們使用了一個內存泄露的示例項目, 這是一個.NET 5 的應用, 你可以在github上面找到，然後下載到本地

https://github.com/sebastienros/memoryleak

git clone https://github.com/sebastienros/memoryleak.git

接下來, 我們讓應用在 docker 容器中運行, 這裏準備了一個 Dockerfile 文件, 你需要把它手動添加到項目的解決方案目錄下

# https://hub.docker.com/_/microsoft-dotnet
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /source

COPY . .
RUN dotnet restore
RUN dotnet publish -c release -o /app --no-restore

# final stage/image
FROM mcr.microsoft.com/dotnet/aspnet:5.0
RUN apt-get update && apt-get install -y procps
WORKDIR /app
COPY --from=build /app .
ENTRYPOINT ["dotnet", "MemoryLeak.dll"]

然後構建鏡像

docker build --pull -t memoryleak-image -f Dockerfile .

因爲我們的 dotnet-monitor 是在 sidecar 容器運行的, 所以需要共享 volume （用於 IPC 通信), 使用以下命令創建一個 volume

docker volume create dotnet-tmp

現在我們準備運行我們的 memoryleak 鏡像, 然後映射端口 80 ---> 5000

docker run -d -it --rm -e TZ=Asia/Shanghai -p 5000:80 --mount "source=dotnet-tmp,target=/tmp" memoryleak-image

運行成功後, 訪問 http://localhost:5000/

2. dotnet-monitor

我們可以安裝 dotnet-monitor 爲全局工具，但讓我們繼續使用 Docker 容器, Microsoft Container Registry 上有可用的容器映像，只需要使用以下命令即可：

docker run -d -it --rm -e TZ=Asia/Shanghai -p 12323:52323 --mount "source=dotnet-tmp,target=/tmp" mcr.microsoft.com/dotnet/monitor --urls http://*:52323 --no-auth

同樣, 掛載了我們上面創建的 dotnet-tmp volume, 然後訪問 http://localhost:12323/metrics, 現在已經能看到指標信息了

3. Prometheus

Prometheus 是一個免費的監控系統和時序數據庫, 我們需要存儲 dotnet-monitor 收集的 metrics 指標信息, 同樣，我們可以通過各種方式運行/安裝 Prometheus，這裏我們還繼續使用 Docker, 首先還需要一個 prometheus.yml 配置文件, 你可以手動添加到解決方案目錄下

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s
alerting:
  alertmanagers:
  - scheme: http
    timeout: 10s
    api_version: v1
    static_configs:
    - targets: []
scrape_configs:
- job_name: prometheus
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090
- job_name: memoryleak
  honor_timestamps: true
  scrape_interval: 2s
  scrape_timeout: 2s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - host.docker.internal:12323

上面配置中, 定時 2s 抓取了 http://host.docker.internal:12323/metrics , 這是 dotnet-monitor 暴露的指標信息, 然後使用下邊的命令, 在 docker 中啓動 prometheus, 注意 D:\Code\dotnet\memoryleak\src\MemoryLeak\prometheus.yml 這裏要改成你本地 prometheus.yml 文件路徑。

docker run -d --name prometheus-container -e TZ=Asia/Shanghai -p 30090:9090 -v D:\Code\dotnet\memoryleak\src\MemoryLeak\prometheus.yml:/etc/prometheus/prometheus.yml ubuntu/prometheus

就是這麼簡單, Prometheus 運行後, 我們訪問 http://localhost:30090/targets, 查看運行狀態, 一切 OK

4. Grafana

最後, 就是用 Grafana 做可視化, 同樣, 我們使用下面的命令在docker中運行 Grafana

docker run -d -e TZ=Asia/Shanghai -p 3000:3000 grafana/grafana

接下來, 我們訪問 http://localhost:3000/, 首次登錄可以使用 admin/admin, 然後轉到 Configuration -> Data sources, 點擊 Add data source 添加數據源, 選擇 Prometheus, 然後設置 URL 爲 http://host.docker.internal:30090/, 其他的參數默認, 然後 Save & test 保存, 如下