glusterfs dht 層lookup 時lookupeverywhere 分析

原創

2020-05-12 03:02

glusterfs7 多次擴容後出現重複文件分析

背景

最近在做glusterfs7版本擴容操作測試的時候，發現偶爾會出現，文件丟失然後再重新創建文件會出現文件重複的現象。具體操作爲從一個節點擴容到10個節點，3副本，添加一個節點操作爲probe node–>add-brick–>replace-brick–>rebalance-fix-layout, 添加至10個節點後，刪除第1個節點，操作爲：replace-brick–>detach node。fuse客戶端操作，stat file，若file 不存在則touch file，就有可能出現兩個重複的file 文件。

原理分析

glusterfs文件分佈是同一個一致性hash 算法hash分佈的，正常情況下我們查找一個文件，只需要在它的hash 節點上查找即可，如果查找不到就說明文件不存在。但是當glusterfs擴容或者縮減節點的時候hash分佈會發生變化會導致查找一個文件可能不在它的hash 節點就需要lookupeverywhere 到所有節點上去查找，當做完rebalance 後修復目錄了hash佈局，就又可以只在hash節點上查找了，無需lookupeverywhere 。

經原理分析，很可能是擴容後，做了rebalance fix-layout，文件在原來的hash分佈節點上，而新的hash 分佈節點上沒有，而dht 層在做lookup的時候沒有去所有節點上查找，所以stat 不到文件，然後就在新的hsah 節點上創建了文件，然後對文件目錄做ls 就會出現重複的文件。

源碼分析

lookupeverywhere 條件判斷

在dht 層dht_lookup_cbk 中如果在hash 節點查找文件返回不存在則通過dht_should_lookup_everywhere 函數判斷是否是否要去所有hash 節點做lookup。

 if (!conf->defrag && loc->parent) {
            ret = dht_inode_ctx_layout_get(loc->parent, this, &parent_layout);
            if (!ret && parent_layout &&
            //父目錄的commit_hash 值和conf的vol_commit_hash 值比較，不相等則lookup everywhere 
                (parent_layout->commit_hash == conf->vol_commit_hash)) {
                lookup_everywhere = _gf_false;
            }
        }

parent_layout->commit_hash 來源：

在dht_disk_layout_merge 函數體現，即父目錄的擴展屬性trused.glusterfs.dht 值的高位32位。

conf->vol_commit_hash 來源：

在dht_revalidate_cbk 函數體現，獲取來自於brick 目錄的擴展屬性trusted.glusterfs.dht.commithash 值。

lookupeverywhere條件分析

parent_layout->commit_hash 值變化點：

（1）parent_layout->commit_hash （trused.glusterfs.dht）的高位32位在目錄創建的時候與brick 的conf->vol_commit_hash（trusted.glusterfs.dht.commithash）值相同。

（2）在集羣做reblance 時候會被修復爲與brick 的conf->vol_commit_hash（trusted.glusterfs.dht.commithash）值一樣。

conf->vol_commit_hash 值變化點：

（1）做rebalance 的時候，glusterd 進程會生成commit hash 值啓動 rebalance進程會以參數的形式傳遞，rebalance 進程會設置所有brick 目錄的擴展屬性（trusted.glusterfs.dht.commithash）爲commit hash值。

背景問題原因

在做rebalance fix layout 的時候，glusterd 進程給rebalance進程傳遞的commit hash 參數值爲0，導致 conf->vol_commit_hash 爲0與未add-brick 前創建的目錄的擴展屬性（trusted.glusterfs.dht.commithash）高位32位也爲0 相等。導致fuse 客戶端訪問目錄時候就會發現一些文件不存在，因爲add-brick hash 範圍變了，所以在hash 節點找不到文件，又沒有觸發 lookupeverywhere 條件，所以文件找不到，然後創建了相同的文件落在了hash 節點上。ls 目錄的時候，就把兩個文件都讀到了，致使出現重複文件。

背景問題解決

解決辦法有兩種：

1.註釋掉dht_should_lookup_everywhere 函數裏面的commit hash 值判斷，只要在hash 節點looup 返回不存在就執行lookupeverywhere 。這個解決辦法有點暴力，會導致查找判斷一個不存在的文件變慢（因爲多了一次lookupeverywhere ）。

2.找到glusterd 進程給rebalance進程傳遞的commit hash 參數值爲0的bug。這個是最佳解決辦法。

總結

glusterfs 分佈式文件系統查找文件是在hash 節點上查找，但是擴容和縮減後hash 分佈變化後，通過文件父目錄的擴展屬性（trusted.glusterfs.dht.commithash）高位32位值和brick 目錄的擴展屬性(trusted.glusterfs.dht.commithash)值來判斷如果文件不在hash節點是否需要去所有節點查找文件。這樣來優化了文件的查找。這也得出結論擴容和縮減後只做fix-layout，以前的老目錄查找文件如果文件不在hash 節點就需要去所有節點查找，做rebalance後所有文件查找就只會在hash 節點查找。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

glusterfs dht 層lookup 時lookupeverywhere 分析

glusterfs7 多次擴容後出現重複文件分析

背景

原理分析

源碼分析

lookupeverywhere 條件判斷

lookupeverywhere條件分析

背景問題原因

背景問題解決

總結

《日本蠟燭圖》讀書筆記 & 技術分析回測

《期貨-市場技術分析》讀書筆記

Python多線程編程深度探索：從入門到實戰

mongodb處理json數據很好

頂級 Javaer 都在用的 20 個類庫，真香！

[轉帖]cpupower

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

業務應用開發總結

glusterfs 存儲節點損壞恢復總結

設計模式golang-工廠模式

設計模式golang-適配器模式

設計模式golang-命令模式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結