ceph中的PG和PGP

在最初使用ceph的時候就有一個疑問,ceph存儲時將對象映射到PG(Placement Groups)中,然後以PG爲單位,遵循CRUSH的規則分佈到OSD中,那麼PGP又是什麼?因此決定自己探索一下。

ceph對象的映射關係架構圖

在這裏插入圖片描述

環境介紹

[root@ceph01 my-cluster]# ceph osd tree
ID WEIGHT  TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.57088 root default                                      
-2 0.19029     host ceph02                                   
 0 0.19029         osd.0        up  1.00000          1.00000 
-3 0.19029     host ceph03                                   
 1 0.19029         osd.1        up  1.00000          1.00000 
-4 0.19029     host ceph01                                   
 2 0.19029         osd.2        up  1.00000          1.00000 

操作驗證

1.創建test_pool

[root@ceph01 my-cluster]# ceph osd pool create test_pool 6 6
pool 'test_pool' created

2.查看創建之後的PG分佈情況

使用命令

[root@ceph01 my-cluster]# ceph pg ls-by-pool test_pool | awk '{print $1,$2,$15}'
pg_stat objects up_primary
10.0 0 [2,0]
10.1 0 [0,2]
10.2 0 [1,2]
10.3 0 [0,2]
10.4 0 [1,2]
10.5 0 [1,2]

可以看出test_pool的pg都是以10開頭的。然後也可以用其他命令查詢PG的分佈情況,如下:

[root@ceph01 my-cluster]#  ceph pg dump pgs|grep ^10|awk '{print $1,$2,$15}'
dumped pgs in format plain
10.2 0 [1,2]
10.3 0 [0,2]
10.0 0 [2,0]
10.1 0 [0,2]
10.4 0 [1,2]
10.5 0 [1,2]

3.寫測試數據到test_pool

[root@ceph01 my-cluster]# rados -p test_pool bench 20 write --no-cleanup

4.再次查詢PG分佈情況

[root@ceph01 my-cluster]# ceph pg ls-by-pool test_pool | awk '{print $1,$2,$15}'
pg_stat objects up_primary
10.0 87 [2,0]
10.1 71 [0,2]
10.2 157 [1,2]
10.3 182 [0,2]
10.4 76 [1,2]
10.5 77 [1,2]

可以看見,第二列的對象數目增加了,但是PG的分佈並無變化。

5.增大pg_num測試

[root@ceph01 my-cluster]# ceph osd pool set test_pool pg_num 12
set pool 10 pg_num to 12
[root@ceph01 my-cluster]# ceph pg ls-by-pool test_pool | awk '{print $1,$2,$15}'
pg_stat objects up_primary
10.0 43 [2,0]
10.1 35 [0,2]
10.2 52 [1,2]
10.3 60 [0,2]
10.4 76 [1,2]
10.5 77 [1,2]
10.6 53 [1,2]
10.7 61 [0,2]
10.8 44 [2,0]
10.9 36 [0,2]
10.a 52 [1,2]
10.b 61 [0,2]

整理之後,數據我門根據對象數,和副本的分佈,我們可以明顯的發現分裂的痕跡

10.0 -> 10.0 + 10.8
10.1 -> 10.1 + 10.9
10.2 -> 10.2 + 10.6 + 10.a
10.3 -> 10.3 + 10.7 + 10.b
10.4 -> 10.4
10.5 -> 10.5

也就是說,增大pg_num,原來大的pg會平均分裂成幾個小的PG,而原來的PG之間是不會互相遷移數據的。

6.增大pgp_num測試

[root@ceph01 my-cluster]# ceph osd pool set test_pool pgp_num 12
set pool 10 pgp_num to 12
[root@ceph01 my-cluster]# ceph pg ls-by-pool test_pool | awk '{print $1,$2,$15}'
pg_stat objects up_primary
10.0 43 [2,0]
10.1 35 [0,2]
10.2 52 [1,2]
10.3 60 [0,2]
10.4 76 [1,2]
10.5 77 [1,2]
10.6 53 [0,2]
10.7 61 [2,1]
10.8 44 [1,0]
10.9 36 [2,0]
10.a 52 [0,2]
10.b 61 [0,1]

我們乍一看好像沒什麼變化,但是仔細一看就會發現不同,我們來對比一下

修改前           修改後
10.0 43 [2,0]   10.0 43 [2,0]
10.1 35 [0,2]   10.1 35 [0,2]
10.2 52 [1,2]   10.2 52 [1,2]
10.3 60 [0,2]   10.3 60 [0,2]
10.4 76 [1,2]   10.4 76 [1,2]
10.5 77 [1,2]   10.5 77 [1,2]
10.6 53 [1,2]*   10.6 53 [0,2]*
10.7 61 [0,2]*   10.7 61 [2,1]*
10.8 44 [2,0]*   10.8 44 [1,0]*
10.9 36 [0,2]*   10.9 36 [2,0]*
10.a 52 [1,2]*   10.a 52 [0,2]*
10.b 61 [0,2]*   10.b 61 [0,1]*

我們可以看見,10.6至10.b的分佈都已經發生了變化。

結論

從測試結果中,我們可以得出以下結論:

  • pg_num的增加會使原來PG中的對象均勻地分佈到新建的PG中,原來的副本分佈方式不變
  • pgp_num的增加會使PG的分佈方式發生變化,但是PG內的對象並不會變動
  • pgp決定pg分佈時的組合方式的變化

另外,附加一段網上的解釋供大家參考

PG = Placement Group
PGP = Placement Group for Placement purpose
pg_num = number of placement groups mapped to an OSD
When pgnum is increased for any pool, every PG of this pool splits into half, but they all remain mapped to their parent OSD.Until this time, Ceph does not start rebalancing. Now, when you increase the pgpnum value for the same pool, PGs start to migrate from the parent to some other OSD, and cluster rebalancing starts. This is how PGP plays an important role.
By Karan Singh

本文參考了zphj1987的博客:http://www.zphj1987.com/2016/10/19/Ceph中PG和PGP的區別/


在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章