Ceph Crush Map規則剖析
bucket type 類型
bucket 層次結構中內部節點的 map 術語:主機、機架、行等。CRUSH 映射定義了一系列用於描述這些節點的類型。默認情況下,這些類型包括:
- osd (or device)
- host 主機
- chassis
- rack 機架
- row
- pdu
- pod
- room 機房
- datacenter 數據中心
- region 亞洲,歐洲
- root
查看本地 Crush 層級結構的簡單視圖
ceph osd crush tree
ceph osd crush dump
ceph osd pool get <pool_name> crush_rule
可以看到爲羣集定義的規則:
ceph osd crush rule ls
您可以查看規則的內容:
ceph osd crush rule dump
命令定製 crush map
# 創建 ssd root bucket
[root@node1 ~]# ceph osd crush add-bucket ssd root
# 創建 節點 host bucket
[root@node1 ~]# ceph osd crush add-bucket node1-ssd host
[root@node1 ~]# ceph osd crush add-bucket node2-ssd host
[root@node1 ~]# ceph osd crush add-bucket node3-ssd host
# 將節點host bucket 指定 root bucket
[root@node1 ~]# ceph osd crush move node1-ssd root=ssd
[root@node1 ~]# ceph osd crush move node2-ssd root=ssd
[root@node1 ~]# ceph osd crush move node3-ssd root=ssd
[root@node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0 root ssd
-10 0 host node1-ssd
-11 0 host node2-ssd
-12 0 host node3-ssd
-1 0.14635 root default
-3 0.04878 host node1
0 hdd 0.01949 osd.0 up 1.00000 1.00000
3 hdd 0.02930 osd.3 up 1.00000 1.00000
-5 0.04878 host node2
1 hdd 0.01949 osd.1 up 1.00000 1.00000
4 hdd 0.02930 osd.4 up 1.00000 1.00000
-7 0.04878 host node3
2 hdd 0.01949 osd.2 up 1.00000 1.00000
5 hdd 0.02930 osd.5 up 1.00000 1.00000
# 將已有的 osd 節點 添加到創建的 crush map
[root@node1 ~]# ceph osd crush move osd.3 host=node1-ssd root=ssd
[root@node1 ~]# ceph osd crush move osd.4 host=node2-ssd root=ssd
[root@node1 ~]# ceph osd crush move osd.5 host=node3-ssd root=ssd
[root@node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.08789 root ssd
-10 0.02930 host node1-ssd
3 hdd 0.02930 osd.3 up 1.00000 1.00000
-11 0.02930 host node2-ssd
4 hdd 0.02930 osd.4 up 1.00000 1.00000
-12 0.02930 host node3-ssd
5 hdd 0.02930 osd.5 up 1.00000 1.00000
-1 0.05846 root default
-3 0.01949 host node1
0 hdd 0.01949 osd.0 up 1.00000 1.00000
-5 0.01949 host node2
1 hdd 0.01949 osd.1 up 1.00000 1.00000
-7 0.01949 host node3
2 hdd 0.01949 osd.2 up 1.00000 1.00000
# 創建 crush rule
[root@node1 ~]# ceph osd crush rule create-replicated
Invalid command: missing required parameter name(<string(goodchars [A-Za-z0-9-_.])>)
osd crush rule create-replicated <name> <root> <type> {<class>} : create crush rule <name> for replicated pool to start from <root>, replicate across buckets of type <type>, use devices of type <class> (ssd or hdd)
Error EINVAL: invalid command
[root@node1 ~]# ceph osd crush rule create-replicated ssd-demo ssd host hdd
# 查看創建的 crush 規則
[root@node1 ~]# ceph osd crush rule dump
[root@node1 ~]# ceph osd crush rule ls
replicated_rule
ssd-demo
# 修改資源池的 crush 規則
[root@node1 ~]# ceph osd pool set pool_demo crush_rule ssd-demo
set pool 9 crush_rule to ssd-demo
[root@node1 ~]# ceph osd pool get pool_demo crush_rule
crush_rule: ssd-demo
# 驗證存放節點是否和設置的一樣
[root@node1 ~]# rbd create pool_demo/demo.img --size 5G
[root@node1 ~]# rbd -p pool_demo ls
demo.img
[root@node1 ~]# ceph osd map pool_demo demo.img
osdmap e183 pool 'pool_demo' (9) object 'demo.img' -> pg 9.c1a6751d (9.d) -> up ([3,4,5], p3) acting ([3,4,5], p3)
注意事項
- 反編譯二進制文件創建的時候需要保留初始.bin文件,有問題方便恢復
- 儘量在集羣搭建之初就設計好,否則有數據在進行修改的話會進行大量的遷移,影響性能
- 手動修改後,重啓 ceph-osd 會恢復成設計之初,osd 節點crush map 會失效,需要修改參數
osd crush update on start = false
[root@node1 my-cluster]# cat ceph.conf # [osd] 內容
[global]
fsid = 3f5560c6-3af3-4983-89ec-924e8eaa9e06
public_network = 192.168.6.0/24
cluster_network = 172.16.79.0/16
mon_initial_members = node1
mon_host = 192.168.6.160
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon_allow_pool_delete = true
[client.rgw.node1]
rgw_frontends = "civetweb port=80"
[osd]
osd crush update on start = false
[root@node1 my-cluster]# ceph-deploy --over-write config push node1 node2 node3
[root@node1 my-cluster]# systemctl restart ceph-osd.target
[root@node1 my-cluster]# ssh node2 systemctl restart ceph-osd.target
[root@node1 my-cluster]# ssh node3 systemctl restart ceph-osd.target