mongodb--解決error RS102 too stale to catch up

今天在做mongodb測試驗證時,日誌報錯,導致主從不同步瞭如:
PRIMARY> rs.status()
{
        "set" : "shard1",
        "date" : ISODate("2012-07-26T02:26:03Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.30.31:27017",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 46826,
                        "optime" : {
                                "t" : 1342791618000,
                                "i" : 562
                        },
                        "optimeDate" : ISODate("2012-07-20T13:40:18Z"),
                        "lastHeartbeat" : ISODate("2012-07-26T02:26:02Z"),
                        "pingMs" : 0,
                        "errmsg" : "error RS102 too stale to catch up"
                },
                {
                        "_id" : 1,
                        "name" : "192.168.30.103:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
              %2nbsp;         "optime" : {
                                "t" : 1343208110000,
                                "i" : 549
                        },
                        "optimeDate" : ISODate("2012-07-25T09:21:50Z"),
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "192.168.30.33:27017",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 46804,
                        "optime" : {
                                "t" : 0,
                                "i" : 0
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2012-07-26T02:26:02Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

日誌信息:
turn:1 reslen:155 0ms
Thu Jul 26 09:39:54 [conn2940] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:54 [conn2940] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:55 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }
Thu Jul 26 09:39:55 [conn2941] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:56 [conn2940] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:56 [conn2940] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:57 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }
Thu Jul 26 09:39:57 [conn2941] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:58 [rsSync] replSet syncing to: 192.168.30.103:27017
Thu Jul 26 09:39:58 BackgroundJob starting: ConnectBG
Thu Jul 26 09:39:58 [rsSync] replHandshake res not: 1 res: { ok: 1.0 }
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up, at least from 192.168.30.103:27017
Thu Jul 26 09:39:58 [rsSync] replSet our last optime : Jul 20 21:40:18 50095fc2:232
Thu Jul 26 09:39:58 [rsSync] replSet oldest at 192.168.30.103:27017 : Jul 25 15:28:41 500fa029:262a
Thu Jul 26 09:39:58 [rsSync] replSet See http://www.mongodb.org/display/D ... +Replica+Set+Member
Thu Jul 26 09:39:58 [rsSync] replSet error RS102 too stale to catch up
Thu Jul 26 09:39:58 [journal] lsn set 44019576
Thu Jul 26 09:39:58 [conn2940] end connection 192.168.30.33:59026
Thu Jul 26 09:39:58 [initandlisten] connection accepted from 192.168.30.33:59037 #2942
Thu Jul 26 09:39:58 [conn2942] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" }
Thu Jul 26 09:39:58 [conn2942] command admin.$cmd command: { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.33:27017" } ntoreturn:1 reslen:155 0ms
Thu Jul 26 09:39:59 [conn2941] run command admin.$cmd { replSetHeartbeat: "shard1", v: 1, pv: 1, checkEmpty: false, from: "192.168.30.103:27017" }

查了一些資料,沒有的很好的解決辦法:

該如何處理?

幸運的是官方文檔 Resyncing a Very Stale Replica Set Member 告訴了問題所在,OPLOG(operation log 的簡稱)。OPLOG 是用於 Replica Set的 PRIMARY 和 SECONDARY 之間同步數據的系統 COLLECTION。OPLOG 的數據大小是有峯值的,64 位機器默認爲 ~19G(19616.9029296875MB),通過 db.printReplicationInfo() 可以查看到: (這裏19G,和我測試的有出入,configured oplog size:  11230.146875MB)

configured oplog size: 19616.9029296875MB (OPLOG 大小)

log length start to end: 15375secs (4.27hrs) (OPLOG 中操作最早與最晚操作的時間差)

oplog first event time: Thu Jul 07 2011 21:03:29 GMT+0800 (CST)

oplog last event time: Fri Jul 08 2011 01:19:44 GMT+0800 (CST)

now: Thu Jul 07 2011 17:20:16 GMT+0800 (CST)

要了解上面參數更詳細的含義可以看下 mongo_vstudio.cpp 源代碼, JS 的噢

https://github.com/mongodb/mongo/blob/master/shell/mongo_vstudio.cpp

當 PRIMARY 有大量操作的時候,OPLOG 裏就會插入相應的大量文檔。每條文檔就是一個操作,有插入(i)、更新(u)、刪除(d)。

test:PRIMARY> db.oplog.rs.find()

{ “ts” : { “t” : 1310044124000, “i” : 11035 }, “h” : NumberLong(“-2807175333144039203″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6612″), … } }

{ “ts” : { “t” : 1310044124000, “i” : 11036 }, “h” : NumberLong(“5285197078463590243″), “op” : “i”, “ns” : “cas_v2.cas_user_flat”, “o” : { “_id” : ObjectId(“4e15afdb1d6988397e0c6613″), … } }

ts: the time this operation occurred.

h: a unique ID for this operation. Each operation will have a different value in this field.

op: the write operation that should be applied to the slave. n indicates a no-op, this is just an informational message.

ns: the database and collection affected by this operation. Since this is a no-op, this field is left blank.

o: the actual document representing the op. Since this is a no-op, this field is pretty useless.

由於 OPLOG 的大小是有限制的,所以 SECONDARY 的同步可能無法更上 PRIMARY 插入的速度。這時候當我們查看 rs.status() 狀態的時候就會出現 “error RS102 too stale to catch up” 的錯誤。

If this occurs, the slave will start giving error messages about needing to be resynced. It can’t catch up to the master from the oplog anymore: it might miss operations between the last oplog entry it has and the master’s oldest oplog entry. It needs a full resync at this point.

解決辦法:

Resyncing a Very Stale Replica Set Member 給出了當我們遇到 Error RS102 錯誤時,該做些什麼事。還可以根據 Halted Replication 中的 Increasing the OpLog Size ,調整 OPLOG 的大小爲適當的值。我測試中我把OPLOG的值調爲20000

This indicates that you’re adding data to the database at a rate of 524MB/hr. If an initial clone takes 10 hours, then the oplog should be at least 5240MB, so something closer to 8GB would make for a safe bet.

最後在數據繼續插入的情況下,使用 rs.remove() 移除 2 個SECONDARY 後,插入又恢復了原來的速度。剩下就是插完後再重新同步 SECONDARY。

>mongo insert in 0.62605094909668 Secs. memory 164.25 MB

>mongo insert in 0.63488984107971 Secs. memory 164 MB

>mongo insert in 0.64394617080688 Secs. memory 164.25 MB

>mongo insert in 0.61102414131165 Secs. memory 164 MB

>mongo insert in 0.64304113388062 Secs. memory 164.25 MB

最後看到高人處看到這個方法,實踐是沒有問題的,不過根據數據量的大小,是需要耗時的,不過在standby上,還好不影響生產性能。也比較耗資源,如:

 

top - 11:15:04 up 149 days, 23:15,  8 users,  load average: 12.37, 8.09, 2.77

Tasks: 390 total,   1 running, 386 sleeping,   2 stopped,   1 zombie

Cpu0  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu3  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu4  : 60.6%us,  3.6%sy,  0.0%ni, 10.9%id, 24.5%wa,  0.0%hi,  0.3%si,  0.0%st

Cpu5  :  3.3%us,  0.3%sy,  0.0%ni, 91.7%id,  4.6%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu6  :  1.0%us,  0.0%sy,  0.0%ni, 94.7%id,  4.3%wa,  0.0%hi,  0.0%si,  0.0%st

Cpu7  :  2.6%us,  0.3%sy,  0.0%ni, 91.7%id,  4.0%wa,  0.3%hi,  1.0%si,  0.0%st

Mem:  16410952k total, 16155884k used,   255068k free,    49356k buffers

Swap:  2096440k total,   283840k used,  1812600k free, 13972792k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                        

13818 root      15   0 56.4g 5.7g 5.7g S 68.0 36.7   6:31.75 mongod 

cpu佔用資源也是比較大的。有沒有更好的方法來處理這個故障,歡迎討論

You don't need to repair, simply perform a full resync.

On the secondary, you can:

stop the failed mongod

delete all data in the dbpath (including subdirectories)

restart it and it will automatically resynchronize itself

Follow the instructions here.

What's happened in your case is that your secondaries have become stale, i.e. there is no common point in their oplog and that of the oplog on the primary. Look at thisdocument, which details the various statuses. The writes to the primary member have to be replicated to the secondaries and your secondaries couldn't keep up until they eventually went stale. You will need to consider resizing your oplog.

Regarding oplog size, it depends on how much data you insert/update over time. I would chose a size which allows you many hours or even days of oplog.

Additionally, I'm not sure which O/S you are running. However, for 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate 5% of the available free disk space to the oplog. If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space. For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog and for 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.

How big are records and how many do you want? It depends on whether this data insertion is something typical or something abnormal that you were merely testing.

For example, at 2000 documents per second for documents of 1KB, that would net you 120MB per minute and your 5GB oplog would last about 40 minutes. This means if the secondary ever goes offline for 40 minutes or falls behind by more than that, then you are stale and have to do a full resync.

I recommend reading the Replica Set Internals document here. You have 4 members in your replica set, which is not recommended. You should have an odd number for thevoting election (of primary) process, so you either need to add an arbiter, another secondary or remove one of your secondaries.

Finally, here's a detailed document on RS administration.

 

一些解釋

Replica Set status狀態說明:

0 Starting up, phase 1

1 Primary

2 Secondary

3 Recovering

4 Fatal error

5 Starting up, phase 2

6 Unknown state

7 Arbiter

8 Down

health 健康度:

0 Server is down

1 Server is up

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章