drbd介紹、工作原理及腦裂故障處理

一、drbd基本介紹

drbd(全稱爲Distributed Replicated Block Device，簡稱drbd)分佈式塊設備複製，說白了就是在不同節點上兩個相同大小的設備塊級別之間的數據同步鏡像。drbd是由內核模塊和相關腳本而構成，用以構建高可用性的集羣。

在高可用(HA)解決方案中使用drbd的功能，可以代替使用一個共享盤陣存儲設備。因爲數據同時存在於本地主機和遠程主機上，在遇到需要切換的時候，遠程主機只需要使用它上面的那份備份數據，就可以繼續提供服務了。

二、drbd的結構示意圖及工作原理

從上圖我們可以清晰的看出drbd是以主從(Primary/Secondary)方式工作的，這點原理與mysql的主從複製的架構有些相似。主節點上的drbd提升爲Primary並負責接收寫入數據，當數據到達drbd模塊時，一份繼續往下走寫入到本地磁盤實現數據的持久化，同時並將接收到的要寫入的數據發送一分到本地的drbd設備上通過tcp傳到另外一臺主機的drbd設備上（Secondary node），另一臺主機上的對應的drbd設備再將接收到的數據存入到自己的磁盤當中。這裏與mysql的基於通過二進制日誌完成數據的複製的確很相似，但是也有一些不同之處。比如：mysql的從節點不能寫但是可以讀，但是drbd的從節點是不能讀、不能掛載。

因此，drbd對同一設備塊每次只允許對主節點進行讀、寫操作，從節點不能寫也不能讀。這樣感覺是不是對主機有資源浪費，的確HA架構中爲了提供冗餘能力是有資源浪費，但是你可以對上圖的兩臺主機建立兩個drbd資源並互爲主從，這樣兩臺機器都能利用起來，但是配置起來就複雜了。但是話又說回來，用drbd作爲廉價的共享存儲設備，要節約很多成本，因爲價格要比專用的存儲網絡便宜很多，其性能與穩定性方面也還不錯。

三、drbd的複製模式（協議）

A協議：

異步複製協議。一旦本地磁盤寫入已經完成，數據包已在發送隊列中，則寫被認爲是完成的。在一個節點發生故障時，可能發生數據丟失，因爲被寫入到遠程節點上的數據可能仍在發送隊列。儘管，在故障轉移節點上的數據是一致的，但沒有及時更新。因此，這種模式效率最高，但是數據不安全，存在數據丟失。

B協議：

內存同步（半同步）複製協議。一旦本地磁盤寫入已完成且複製數據包達到了對等節點則認爲寫在主節點上被認爲是完成的。數據丟失可能發生在參加的兩個節點同時故障的情況下，因爲在傳輸中的數據可能不會被提交到磁盤

C協議：

同步複製協議。只有在本地和遠程節點的磁盤已經確認了寫操作完成，寫才被認爲完成。沒有數據丟失，所以這是一個羣集節點的流行模式，但I/O吞吐量依賴於網絡帶寬。因此，這種模式數據相對安全，但是效率比較低。

四、drbd的安裝配置

1、安裝 sudo apt-get install drbd8-utils

2、兩節點準備工作

node1、node2時間同步；

兩節點各自準備一個大小相同的分區塊；

建立雙機互信，實現互信登陸

3、drbd文件結構說明

/etc/drbd.conf 主配置文件

/etc/drbd.d/global_common.conf 定義配置global、common段

/etc/drbd.d/*.res 定義資源

4、drbd配置

4.1、global_common.conf

global {

usage-count no; 是否加入統計

}

common {

protocol C; 使用什麼協議

handlers {

定義處理機制程序，/usr/lib/drbd/ 裏有大量的程序腳本，但是不一定靠譜

}

startup {

定義啓動超時等

}

disk {

磁盤相關公共設置，比如I/O，磁盤故障了怎麼辦

}

net {

定義網絡傳輸、加密算法等

}

syncer {

rate 1000M; 定義網絡傳輸速率

}

4.2、資源配置(*.res)

resource name{

meta-disk internal; # node1/node2 公用部分可以提取到頂部

on node1{

device /dev/drbd0;

disk /dev/sda6;

address 192.168.1.101:7789;

}

on node2 {

device /dev/drbd0;

disk /dev/sda6;

address 192.168.1.102:7789;

}

5、以上文件在兩個節點上必須相同，因此，可以基於ssh將剛纔配置的文件全部同步至另外一個節點

# scp -p /etc/drbd.d/* node2:/etc/drbd.d /

6、啓動測試

1）初始化資源，在Node1和Node2上分別執行：

# sudo drbdadm create-md mydata

2）啓動服務，在Node1和Node2上分別執行：

# sudo service drbd start

3）查看啓動狀態

# cat /proc/drbd

4）從上面的信息中可以看出此時兩個節點均處於Secondary狀態。於是，我們接下來需要將其中一個節點設置爲Primary。在要設置爲Primary的節點上執行如下命令：

# sudo drbdadm -- --overwrite-data-of-peer primary all (第一次執行此命令)

# sudo drbdadm primary --force mydata

第一次執行完此命令後，在後面如果需要設置哪個是主節點時，就可以使用另外一個命令：

# /sbin/drbdadm primary r0或者/sbin/drbdadm primary all

5）監控數據同步

# watch -n1 'cat /proc/drbd'

6）數據同步完成格式化drbd分區，並掛載

# sudo mke2fs -t ext4 /dev/drbd0

# sudo moun/dev/drbd0 /mnt

# ls -l /mnt

測試OK~

五、腦裂故障處理

在做Corosync+DRBD的高可用MySQL集羣實驗中，意外發現各個節點無法識別對方，連接爲StandAlone則主從節點無法通信，效果如上圖。

以下爲drbd腦裂手動恢復過程(以node1的數據位主，放棄node2不同步數據)：

1）將Node1設置爲主節點並掛載測試，mydata爲定義的資源名

# drbdadm primary mydata

# mount /dev/drbd0 /mydata

# ls -lh /mydata 查看文件情況

2）將Node2設置爲從節點並丟棄資源數據

# drbdadm secondary mydata

# drbdadm -- --discard-my-data connect mydata

3）在Node1主節點上手動連接資源

# drbdadm connect mydata

4）最後查看各個節點狀態，連接已恢復正常

# cat /proc/drbd

測試效果如下圖(故障修復)：

六、drbd其他相關（文獻部分）：

1、 DRBD各種狀態含義

The resource-specific output from/proc/drbd contains various pieces ofinformation about the resource:
cs (connection state). Status of the network connection. See the section called “Connection states” for details about the various connection states.
ro (roles). Roles of the nodes. The role of the local node isdisplayed first, followed by the role of the partnernode shown after the slash. See the section called “Resource roles” for details about thepossible resource roles.
ds (disk states). State of the hard disks. Prior to the slash thestate of the local node is displayed, after the slashthe state of the hard disk of the partner node isshown. See the section called “Disk states” for details about the variousdisk states.
ns (network send). Volume of net data sent to the partner via thenetwork connection; in Kibyte.
nr (network receive). Volume of net data received by the partner viathe network connection; in Kibyte.
dw (disk write). Net data written on local hard disk; inKibyte.
dr (disk read). Net data read from local hard disk; in Kibyte.
al (activity log). Number of updates of the activity log area of the metadata.
bm (bit map). Number of updates of the bitmap area of the metadata.
lo (local count). Number of open requests to the local I/O sub-systemissued by DRBD.
pe (pending). Number of requests sent to the partner, but thathave not yet been answered by the latter.
ua (unacknowledged). Number of requests received by the partner via thenetwork connection, but that have not yet beenanswered.
ap (application pending). Number of block I/O requests forwarded to DRBD, butnot yet answered by DRBD.
ep (epochs). Number of epoch objects. Usually 1. Might increaseunder I/O load when using either thebarrier or the none writeordering method. Since 8.2.7.
wo (write order). Currently used write ordering method:b (barrier), f (flush),d (drain) or n(none). Since8.2.7.
oos (out of sync). Amount of storage currently out of sync; inKibibytes. Since 8.2.6.

2、 DRBD連接狀態

A resource may have one of the following connectionstates:
StandAlone. No network configuration available. The resourcehas not yet been connected, or has beenadministratively disconnected (using drbdadm disconnect), or has dropped its connectiondue to failed authentication or split brain.
Disconnecting. Temporary state during disconnection. The nextstate is StandAlone.
Unconnected. Temporary state, prior to a connection attempt.Possible next states: WFConnection andWFReportParams.
Timeout. Temporary state following a timeout in thecommunication with the peer. Next state:Unconnected.
BrokenPipe. Temporary state after the connection to the peerwas lost. Next state: Unconnected.
NetworkFailure. Temporary state after the connection to thepartner was lost. Next state: Unconnected.
ProtocolError. Temporary state after the connection to thepartner was lost. Next state: Unconnected.
TearDown. Temporary state. The peer is closing theconnection. Next state: Unconnected.
WFConnection. This node is waiting until the peer node becomesvisible on the network.
WFReportParams. TCP connection has been established, this nodewaits for the first network packet from thepeer.
Connected. A DRBD connection has been established, datamirroring is now active. This is the normalstate.
StartingSyncS. Full synchronization, initiated by theadministrator, is just starting. The next possiblestates are: SyncSource or PausedSyncS.
StartingSyncT. Full synchronization, initiated by theadministrator, is just starting. Next state:WFSyncUUID.
WFBitMapS. Partial synchronization is just starting. Nextpossible states: SyncSource or PausedSyncS.
WFBitMapT. Partial synchronization is just starting. Nextpossible state: WFSyncUUID.
WFSyncUUID. Synchronization is about to begin. Next possiblestates: SyncTarget or PausedSyncT.
SyncSource. Synchronization is currently running, with thelocal node being the source ofsynchronization.
SyncTarget. Synchronization is currently running, with thelocal node being the target ofsynchronization.
PausedSyncS. The local node is the source of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.
PausedSyncT. The local node is the target of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.
VerifyS. On-line device verification is currently running,with the local node being the source ofverification.
VerifyT. On-line device verification is currently running,with the local node being the target ofverification.

drbd介紹、工作原理及腦裂故障處理

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

Shell/Python中的用戶名獲取

Ubuntu - 硬盤分區、格式化、自動掛載配置

drbd介紹、工作原理及腦裂故障處理

NFS原理

MySql學習之 -- 存儲引擎

我的友情鏈接

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結