防止ceph 集羣 IO hang

原創

osc_vqbasmql

2021-01-30 09:37

在ceph集羣的使用過程中，經常會遇到一種情況，當ceph集羣出現故障，比如網絡故障，導致集羣無法鏈接時，作爲客戶端，所有的IO都會出現hang的現象。

這樣的現象對於生產業務來說是很不能忍受的。舉例如下：

環境

./vstart.sh -l -k --bluestore

id:     338b8b2e-fe88-4f2c-af4d-2359994a7da9

        application not enabled on 1 pool(s)

mon: 3 daemons, quorum a,b,c

mds: cephfs_a-1/1/1 up  {0=b=up:active}, 2 up:standby

osd: 3 osds: 3 up, 3 in

pools:   3 pools, 24 pgs

objects: 74 objects, 115 MB

usage:   3617 MB used, 27294 MB / 30911 MB avail

pgs:     24 active+clean

[root@atest-guest build]# rbd info test1

    size 20480 MB in 5120 objects

    order 22 (4096 kB objects)

    block_name_prefix: rbd_data.103a2ae8944a

    features: layering, exclusive-lock, object-map

    flags: object map invalid

    create_timestamp: Wed Mar 14 08:40:39 2018

ps -ef|grep -E “ceph-mon|ceph-osd”|gawk ‘{print "kill -9 "$2}’|bash

（1）rbd 命令

這時，如果我們如果執行rbd命令，就會發現命令hang住了。
rbd ls

分析

打開debug之後可以看到，這個命令一直在做auth操作，也就是說一直去發送認證，然後卡住了，導致hang。

解決

爲了解決這個問題，ceph社區引入了timeout機制，簡單來說就是，每個請求都有一個timeout，如果超過這個時間限制，我們就直接取消這個request，然後返回錯誤。

參數如下： client_mount_timeout = 5

結果如下：

2018-03-15 07:58:28.714157 7f76c817fd40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:58:28.714606 7f76c817fd40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:58:28.746878 7f76c817fd40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:58:33.750431 7f76c817fd40 0 monclient(hunting): authenticate timed out after 5

2018-03-15 07:58:33.750489 7f76c817fd40 0 librados: client.admin authentication error (110) Connection timed out

rbd: couldn’t connect to the cluster!

rbd: list: (110) Connection timed out

（2）IO hang

但是上述命令只是在auth的時候timeout，如果我已經開始讀寫之後，集羣發生異常，不會去做auth操作，豈不是還會有問題？

[root@atest-guest build]# rbd bench --io-type write --io-size 4 --io-threads 1 test1

2018-03-15 07:52:37.735520 7f2bdb81bd40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:52:37.735941 7f2bdb81bd40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:52:37.769038 7f2bdb81bd40 -1 WARNING: all dangerous and experimental features are enabled.

bench type write io_size 4 io_threads 1 bytes 1073741824 pattern sequential

SEC       OPS   OPS/SEC   BYTES/SEC

1        75     75.51    302.05

2       140     70.25    280.98

3       198     66.01    264.04

4       249     62.35    249.42

5       299     59.82    239.28

6       351     55.23    220.92

7       401     52.25    208.99

不出所料，在第七秒的時候，將ceph-osd和ceph-mon都kill掉，rbd bench就一直hang在這個地方，無法退出。
爲了解決這個問題，ceph社區引入了兩個參數

    rados osd op timeout = 5
    rados mon op timeout = 5

如果設置了這兩個參數，我們發出去的請求就會在五秒之後退出，返回錯誤。

結果如下：

rbd bench --io-type write --io-size 4 --io-threads 1 test1

2018-03-15 07:54:11.243056 7fad2c936d40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:54:11.243513 7fad2c936d40 -1 WARNING: all dangerous and experimental features are enabled.

2018-03-15 07:54:11.276004 7fad2c936d40 -1 WARNING: all dangerous and experimental features are enabled.

bench type write io_size 4 io_threads 1 bytes 1073741824 pattern sequential

SEC       OPS   OPS/SEC   BYTES/SEC

1        41     41.99    167.96

2        91     45.53    182.12

3       139     46.63    186.53

4       190     47.62    190.47

5       240     48.11    192.43

6       293     50.31    201.25

write error: (110) Connection timed out

（3）kernel rbd
以上兩種情況都解決的情況下，我們來到另一個領域，kernel rbd。沒有錯，如果集羣發生異常，我們的kernel rbd社區無法鏈接集羣的話，是沒有辦法卸載的。

同時，當我們想要重啓來卸載這個設備的時候，你會發現重啓卡住了。OMG，WTF。原因還是因爲關機需要卸載rbd設備，在卸載rbd 設備的時候需要去鏈接

ceph集羣，但是鏈接不上，導致一直hang住。

爲了解決這個問題，kernel rbd也提供了一個參數可以設置：osd_request_timeout.

可惜的是我們在rbd map的時候不能指定這個參數。直到 https://github.com/ceph/ceph/pull/20792 這個merge之後，我們可以通過如下命令來設置參數。

rbd map -o osd_request_timeout=30 test

這樣，我們可以正常重啓，或者通過rbd unmap -o force test 來卸載rbd 設備。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

防止ceph 集羣 IO hang

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

京東面試：如何進行JVM調優？

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

徒手擼一個Spring Boot中的starter

【Java - bug】項目實踐-mysql

Js+Map實現兩數之和

Leetcode-Mysql題目及知識點總結（1069.產品銷售分析II&1075.項目員工I）

隊列(靜態方式)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結