這篇文章看完這些問題就可以搞定了。NoSQL的產生就是爲了解決大數據量、高擴展性、高性能、靈活數據模型、高可用性。但是光通過主從模式的架構遠遠達不到上面幾點,由此MongoDB設計了副本集和分片的功能。這篇文章主要介紹副本集:
mongoDB官方已經不建議使用主從模式了,替代方案是採用副本集的模式, 點擊查看,如圖:
那什麼是副本集呢?打魔獸世界總說打副本,其實這兩個概念差不多一個意思。遊戲裏的副本是指玩家集中在高峯時間去一個場景打怪,會出現玩家暴多怪物少的情況,遊戲開發商爲了保證玩家的體驗度,就爲每一批玩家單獨開放一個同樣的空間同樣的數量的怪物,這一個複製的場景就是一個副本,不管有多少個玩家各自在各自的副本里玩不會互相影響。 mongoDB的副本也是這個,主從模式其實就是一個單副本的應用,沒有很好的擴展性和容錯性。而副本集具有多個副本保證了容錯性,就算一個副本掛掉了還有很多副本存在,並且解決了上面第一個問題“主節點掛掉了,整個集羣內會自動切換”。難怪mongoDB官方推薦使用這種模式。我們來看看mongoDB副本集的架構圖:
由圖可以看到客戶端連接到整個副本集,不關心具體哪一臺機器是否掛掉。主服務器負責整個副本集的讀寫,副本集定期同步數據備份,一但主節點掛掉,副本節點就會選舉一個新的主服務器,這一切對於應用服務器不需要關心。我們看一下主服務器掛掉後的架構:
副本集中的副本節點在主節點掛掉後通過心跳機制檢測到後,就會在集羣內發起主節點的選舉機制,自動選舉一位新的主服務器。看起來很牛X的樣子,我們趕緊操作部署一下!
官方推薦的副本集機器數量爲至少3個,那我們也按照這個數量配置測試。
1、準備兩臺機器 192.168.1.136、192.168.1.137、192.168.1.138。 192.168.1.136 當作副本集主節點,192.168.1.137、192.168.1.138作爲副本集副本節點。
2、分別在每臺機器上建立mongodb副本集測試文件夾
3、下載mongodb的安裝程序包
注意linux生產環境不能安裝32位的mongodb,因爲32位受限於操作系統最大2G的文件限制。
4、分別在每臺機器上啓動mongodb
可以看到控制檯上顯示副本集還沒有配置初始化信息。
- Sun Dec 29 20:12:02.953 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
- Sun Dec 29 20:12:02.953 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
5、初始化副本集
在三臺機器上任意一臺機器登陸mongodb
#定義副本集配置變量,這裏的 _id:”repset” 和上面命令參數“ –replSet repset” 要保持一樣。
#輸出
- {
- "_id" : "repset",
- "members" : [
- {
- "_id" : 0,
- "host" : "192.168.1.136:27017"
- },
- {
- "_id" : 1,
- "host" : "192.168.1.137:27017"
- },
- {
- "_id" : 2,
- "host" : "192.168.1.138:27017"
- }
- ]
- }
#輸出成功
- {
- "info" : "Config now saved locally. Should come online in about a minute.",
- "ok" : 1
- }
#查看日誌,副本集啓動成功後,138爲主節點PRIMARY,136、137爲副本節點SECONDARY。
- Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate admin command
- received from client Sun Dec 29 20:26:13.842 [conn3] replSet replSetInitiate
- config object parses ok, 3 members specified Sun Dec 29 20:26:13.847 [conn3]
- replSet replSetInitiate all members seem up Sun Dec 29 20:26:13.848 [conn3]
- ****** Sun Dec 29 20:26:13.848 [conn3] creating replication oplog of size:
- 990MB... Sun Dec 29 20:26:13.849 [FileAllocator] allocating new datafile
- /data/mongodbtest/replset/data/local.1, filling with zeroes... Sun Dec
- 29 20:26:13.862 [FileAllocator] done allocating datafile /data/mongodbtest/replset/data/local.1,
- size: 1024MB, took 0.012 secs Sun Dec 29 20:26:13.863 [conn3] ****** Sun
- Dec 29 20:26:13.863 [conn3] replSet info saving a newer config version
- to local.system.replset Sun Dec 29 20:26:13.864 [conn3] replSet saveConfigLocally
- done Sun Dec 29 20:26:13.864 [conn3] replSet replSetInitiate config now
- saved locally. Should come online in about a minute. Sun Dec 29 20:26:23.047
- [rsStart] replSet I am 192.168.1.138:27017 Sun Dec 29 20:26:23.048 [rsStart]
- replSet STARTUP2 Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet member
- 192.168.1.137:27017 is up Sun Dec 29 20:26:23.049 [rsHealthPoll] replSet
- member 192.168.1.136:27017 is up Sun Dec 29 20:26:24.051 [rsSync] replSet
- SECONDARY Sun Dec 29 20:26:25.053 [rsHealthPoll] replset info 192.168.1.136:27017
- thinks that we are down Sun Dec 29 20:26:25.053 [rsHealthPoll] replSet
- member 192.168.1.136:27017 is now in state STARTUP2 Sun Dec 29 20:26:25.056
- [rsMgr] not electing self, 192.168.1.136:27017 would veto with 'I don't
- think 192.168.1.138:27017 is electable' Sun Dec 29 20:26:31.059 [rsHealthPoll]
- replset info 192.168.1.137:27017 thinks that we are down Sun Dec 29 20:26:31.059
- [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state STARTUP2
- Sun Dec 29 20:26:31.062 [rsMgr] not electing self, 192.168.1.137:27017
- would veto with 'I don't think 192.168.1.138:27017 is electable' Sun Dec
- 29 20:26:37.074 [rsMgr] replSet info electSelf 2 Sun Dec 29 20:26:38.062
- [rsMgr] replSet PRIMARY Sun Dec 29 20:26:39.071 [rsHealthPoll] replSet
- member 192.168.1.137:27017 is now in state RECOVERING Sun Dec 29 20:26:39.075
- [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state RECOVERING
- Sun Dec 29 20:26:42.201 [slaveTracking] build index local.slaves { _id:
- 1 } Sun Dec 29 20:26:42.207 [slaveTracking] build index done. scanned 0
- total records. 0.005 secs Sun Dec 29 20:26:43.079 [rsHealthPoll] replSet
- member 192.168.1.136:27017 is now in state SECONDARY Sun Dec 29 20:26:49.080
- [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state SECONDARY
#輸出
- {
- "set" : "repset",
- "date" : ISODate("2013-12-29T12:54:25Z"),
- "myState" : 1,
- "members" : [
- {
- "_id" : 0,
- "name" : "192.168.1.136:27017",
- "health" : 1,
- "state" : 2,
- "stateStr" : "SECONDARY",
- "uptime" : 1682,
- "optime" : Timestamp(1388319973, 1),
- "optimeDate" : ISODate("2013-12-29T12:26:13Z"),
- "lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"),
- "lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"),
- "pingMs" : 1,
- "syncingTo" : "192.168.1.138:27017"
- },
- {
- "_id" : 1,
- "name" : "192.168.1.137:27017",
- "health" : 1,
- "state" : 2,
- "stateStr" : "SECONDARY",
- "uptime" : 1682,
- "optime" : Timestamp(1388319973, 1),
- "optimeDate" : ISODate("2013-12-29T12:26:13Z"),
- "lastHeartbeat" : ISODate("2013-12-29T12:54:25Z"),
- "lastHeartbeatRecv" : ISODate("2013-12-29T12:54:24Z"),
- "pingMs" : 1,
- "syncingTo" : "192.168.1.138:27017"
- },
- {
- "_id" : 2,
- "name" : "192.168.1.138:27017",
- "health" : 1,
- "state" : 1,
- "stateStr" : "PRIMARY",
- "uptime" : 2543,
- "optime" : Timestamp(1388319973, 1),
- "optimeDate" : ISODate("2013-12-29T12:26:13Z"),
- "self" : true
- }
- ],
- "ok" : 1
- }
整個副本集已經搭建成功了。
6、測試副本集數據複製功能
#輸出
- Sun Dec 29 21:50:48.590 error: { "$err" : "not master and slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:128
- #輸出
- { "_id" : ObjectId("52c028460c7505626a93944f"), "test1" : "testval1" }
7、測試副本集故障轉移功能
先停掉主節點mongodb 138,查看136、137的日誌可以看到經過一系列的投票選擇操作,137 當選主節點,136從137同步數據過來。
- Sun Dec 29 22:03:05.351 [rsBackgroundSync] replSet sync source problem:
- 10278 dbclient error communicating with server: 192.168.1.138:27017 Sun
- Dec 29 22:03:05.354 [rsBackgroundSync] replSet syncing to: 192.168.1.138:27017
- Sun Dec 29 22:03:05.356 [rsBackgroundSync] repl: couldn't connect to server
- 192.168.1.138:27017 Sun Dec 29 22:03:05.356 [rsBackgroundSync] replSet
- not trying to sync from 192.168.1.138:27017, it is vetoed for 10 more seconds
- Sun Dec 29 22:03:05.499 [rsHealthPoll] DBClientCursor::init call() failed
- Sun Dec 29 22:03:05.499 [rsHealthPoll] replset info 192.168.1.138:27017
- heartbeat failed, retrying Sun Dec 29 22:03:05.501 [rsHealthPoll] replSet
- info 192.168.1.138:27017 is down (or slow to respond): Sun Dec 29 22:03:05.501
- [rsHealthPoll] replSet member 192.168.1.138:27017 is now in state DOWN
- Sun Dec 29 22:03:05.511 [rsMgr] not electing self, 192.168.1.137:27017
- would veto with '192.168.1.136:27017 is trying to elect itself but 192.168.1.138:27017
- is already primary and more up-to-date' Sun Dec 29 22:03:07.330 [conn393]
- replSet info voting yea for 192.168.1.137:27017 (1) Sun Dec 29 22:03:07.503
- [rsHealthPoll] replset info 192.168.1.138:27017 heartbeat failed, retrying
- Sun Dec 29 22:03:08.462 [rsHealthPoll] replSet member 192.168.1.137:27017
- is now in state PRIMARY Sun Dec 29 22:03:09.359 [rsBackgroundSync] replSet
- syncing to: 192.168.1.137:27017 Sun Dec 29 22:03:09.507 [rsHealthPoll]
- replset info 192.168.1.138:27017 heartbeat failed, retrying
查看整個集羣的狀態,可以看到138爲狀態不可達。
#輸出
- {
- "set" : "repset",
- "date" : ISODate("2013-12-29T14:28:35Z"),
- "myState" : 2,
- "syncingTo" : "192.168.1.137:27017",
- "members" : [
- {
- "_id" : 0,
- "name" : "192.168.1.136:27017",
- "health" : 1,
- "state" : 2,
- "stateStr" : "SECONDARY",
- "uptime" : 9072,
- "optime" : Timestamp(1388324934, 1),
- "optimeDate" : ISODate("2013-12-29T13:48:54Z"),
- "self" : true
- },
- {
- "_id" : 1,
- "name" : "192.168.1.137:27017",
- "health" : 1,
- "state" : 1,
- "stateStr" : "PRIMARY",
- "uptime" : 7329,
- "optime" : Timestamp(1388324934, 1),
- "optimeDate" : ISODate("2013-12-29T13:48:54Z"),
- "lastHeartbeat" : ISODate("2013-12-29T14:28:34Z"),
- "lastHeartbeatRecv" : ISODate("2013-12-29T14:28:34Z"),
- "pingMs" : 1,
- "syncingTo" : "192.168.1.138:27017"
- },
- {
- "_id" : 2,
- "name" : "192.168.1.138:27017",
- "health" : 0,
- "state" : 8,
- "stateStr" : "(not reachable/healthy)",
- "uptime" : 0,
- "optime" : Timestamp(1388324934, 1),
- "optimeDate" : ISODate("2013-12-29T13:48:54Z"),
- "lastHeartbeat" : ISODate("2013-12-29T14:28:35Z"),
- "lastHeartbeatRecv" : ISODate("2013-12-29T14:28:23Z"),
- "pingMs" : 0,
- "syncingTo" : "192.168.1.137:27017"
- }
- ],
- "ok" : 1
- }
再啓動原來的主節點 138,發現138 變爲 SECONDARY,還是137 爲主節點 PRIMARY。
- Sun Dec 29 22:21:06.619 [rsStart] replSet I am 192.168.1.138:27017
- Sun Dec 29 22:21:06.619 [rsStart] replSet STARTUP2
- Sun Dec 29 22:21:06.627 [rsHealthPoll] replset info 192.168.1.136:27017 thinks that we are down
- Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is up
- Sun Dec 29 22:21:06.627 [rsHealthPoll] replSet member 192.168.1.136:27017 is now in state SECONDARY
- Sun Dec 29 22:21:07.628 [rsSync] replSet SECONDARY
- Sun Dec 29 22:21:08.623 [rsHealthPoll] replSet member 192.168.1.137:27017 is up
- Sun Dec 29 22:21:08.624 [rsHealthPoll] replSet member 192.168.1.137:27017 is now in state PRIMARY
8、java程序連接副本集測試。三個節點有一個節點掛掉也不會影響應用程序客戶端對整個副本集的讀寫!
- public class TestMongoDBReplSet { public static void main(String[] args)
- { try { List<ServerAddress> addresses = new ArrayList<ServerAddress>();
- ServerAddress address1 = new ServerAddress("192.168.1.136" , 27017); ServerAddress
- address2 = new ServerAddress("192.168.1.137" , 27017); ServerAddress address3
- = new ServerAddress("192.168.1.138" , 27017); addresses.add(address1);
- addresses.add(address2); addresses.add(address3); MongoClient client =
- new MongoClient(addresses); DB db = client.getDB( "test"); DBCollection
- coll = db.getCollection( "testdb"); // 插入 BasicDBObject object = new BasicDBObject();
- object.append( "test2", "testval2" ); coll.insert(object); DBCursor dbCursor
- = coll.find(); while (dbCursor.hasNext()) { DBObject dbObject = dbCursor.next();
- System. out.println(dbObject.toString()); } } catch (Exception e) { e.printStackTrace();
- } } }
目前看起來支持完美的故障轉移了,這個架構是不是比較完美了?其實還有很多地方可以優化,比如開頭的第二個問題:主節點的讀寫壓力過大如何解決?常見的解決方案是讀寫分離,mongodb副本集的讀寫分離如何做呢?
看圖說話:
常規寫操作來說並沒有讀操作多,所以一臺主節點負責寫,兩臺副本節點負責讀。
1、設置讀寫分離需要先在副本節點SECONDARY 設置 setSlaveOk。
2、在程序中設置副本節點負責讀操作,如下代碼:
- public class TestMongoDBReplSetReadSplit {
- public static void main(String[] args) {
- try {
- List<ServerAddress> addresses = new ArrayList<ServerAddress>();
- ServerAddress address1 = new ServerAddress("192.168.1.136" , 27017);
- ServerAddress address2 = new ServerAddress("192.168.1.137" , 27017);
- ServerAddress address3 = new ServerAddress("192.168.1.138" , 27017);
- addresses.add(address1);
- addresses.add(address2);
- addresses.add(address3);
- MongoClient client = new MongoClient(addresses);
- DB db = client.getDB( "test" );
- DBCollection coll = db.getCollection( "testdb" );
- BasicDBObject object = new BasicDBObject();
- object.append( "test2" , "testval2" );
- //讀操作從副本節點讀取
- ReadPreference preference = ReadPreference. secondary();
- DBObject dbObject = coll.findOne(object, null , preference);
- System. out .println(dbObject);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
讀參數除了secondary一共還有五個參數:primary、primaryPreferred、secondary、secondaryPreferred、nearest。
primary:默認參數,只從主節點上進行讀取操作;
primaryPreferred:大部分從主節點上讀取數據,只有主節點不可用時從secondary節點讀取數據。
secondary:只從secondary節點上進行讀取操作,存在的問題是secondary節點的數據會比primary節點數據“舊”。
secondaryPreferred:優先從secondary節點進行讀取操作,secondary節點不可用時從主節點讀取數據;
nearest:不管是主節點、secondary節點,從網絡延遲最低的節點上讀取數據。
好,讀寫分離做好我們可以數據分流,減輕壓力解決了“主節點的讀寫壓力過大如何解決?”這個問題。不過當我們的副本節點增多時,主節點的複製壓力會加大有什麼辦法解決嗎?mongodb早就有了相應的解決方案。
其中的仲裁節點不存儲數據,只是負責故障轉移的羣體投票,這樣就少了數據複製的壓力。是不是想得很周到啊,一看mongodb的開發兄弟熟知大數據架構體系,其實不只是主節點、副本節點、仲裁節點,還有Secondary-Only、Hidden、Delayed、Non-Voting。
Secondary-Only:不能成爲primary節點,只能作爲secondary副本節點,防止一些性能不高的節點成爲主節點。
Hidden:這類節點是不能夠被客戶端制定IP引用,也不能被設置爲主節點,但是可以投票,一般用於備份數據。
Delayed:可以指定一個時間延遲從primary節點同步數據。主要用於備份數據,如果實時同步,誤刪除數據馬上同步到從節點,恢復又恢復不了。
Non-Voting:沒有選舉權的secondary節點,純粹的備份數據節點。
到此整個mongodb副本集搞定了兩個問題:
- 主節點掛了能否自動切換連接?目前需要手工切換。
- 主節點的讀寫壓力過大如何解決?
還有這兩個問題後續解決:
- 從節點每個上面的數據都是對數據庫全量拷貝,從節點壓力會不會過大?
- 數據壓力大到機器支撐不了的時候能否做到自動擴展?
做了副本集發現又一些問題:
- 副本集故障轉移,主節點是如何選舉的?能否手動干涉下架某一臺主節點。
- 官方說副本集數量最好是奇數,爲什麼?
- mongodb副本集是如何同步的?如果同步不及時會出現什麼情況?會不會出現不一致性?
- mongodb的故障轉移會不會無故自動發生?什麼條件會觸發?頻繁觸發可能會帶來系統負載加重
參考:
http://cn.docs.mongodb.org/manual/administration/replica-set-member-configuration/