Mongo崩潰crash, 報out of memory的問題分析與解決方案【記錄】

 

1  問題描述

       應用程序和MongoDB運行時,數據量在100M以內,系統運行3天左右後,MongoDB報OOM的錯誤並退出。

       使用環境:

  • windows 10
  • Mongodb 3.4.2

       異常信息:

2019-05-09T19:20:44.186+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\util\stacktrace_windows.cpp(239)                                   mongo::printStackTrace+0x43
2019-05-09T19:20:44.186+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\util\signal_handlers_synchronous.cpp(332)                          mongo::reportOutOfMemoryErrorAndExit+0x90
2019-05-09T19:20:44.186+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\util\allocator.cpp(51)                                             mongo::mongoRealloc+0x19
2019-05-09T19:20:44.186+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\bson\util\builder.h(332)                                           mongo::_BufBuilder<mongo::SharedBufferAllocator>::grow_reallocate+0x195
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\rpc\legacy_reply_builder.cpp(84)                                   mongo::rpc::LegacyReplyBuilder::getInPlaceReplyBuilder+0x31
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\commands\dbcommands.cpp(1486)                                   mongo::Command::run+0xa7
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\commands\dbcommands.cpp(1443)                                   mongo::Command::execCommand+0xb9d
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\run_commands.cpp(73)                                            mongo::runCommands+0x4e4
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\instance.cpp(236)                                               mongo::`anonymous namespace'::receivedCommand+0x1d4
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\instance.cpp(614)                                               mongo::assembleResponse+0x7ba
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\db\service_entry_point_mongod.cpp(135)                             mongo::ServiceEntryPointMongod::_sessionLoop+0x159
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    c:\program files (x86)\microsoft visual studio 14.0\vc\include\functional(212)   std::_Func_impl<<lambda_0476eb5123845e42a2f452f2efde6866>,std::allocator<int>,void,std::shared_ptr<mongo::transport::Session> const & __ptr64>::_Do_call+0x43
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    ...\src\mongo\transport\service_entry_point_utils.cpp(78)                        mongo::`anonymous namespace'::runFunc+0x1b3
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(247)  std::_LaunchPad<std::unique_ptr<std::tuple<std::_Binder<std::_Unforced,void * __ptr64 (__cdecl&)(void * __ptr64),mongo::`anonymous namespace'::Context * __ptr64> >,std::default_delete<std::tuple<std::_Binder<std::_Unforced,void * __ptr64 (__cdecl&)(void * __ptr64),mongo::`anonymous namespace'::Context * __ptr64> > > > >::_Run+0x75
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] mongod.exe    c:\program files (x86)\microsoft visual studio 14.0\vc\include\thr\xthread(210)  std::_Pad::_Call_func+0x9
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] ucrtbase.dll                                                                                   o_strcat_s+0x5e
2019-05-09T19:20:44.187+0800 I CONTROL  [conn59] KERNEL32.DLL                                                                                   BaseThreadInitThunk+0x14
2019-05-09T19:20:44.187+0800 F -        [conn59] out of memory.
 

2  Google上前人的回答及實驗

  1. stackoverflow 的相似的問題
  2. 一個MongoDB經常crash並報OOM的解決方案和實驗過程

       我重複了鏈接2的實驗及驗證其結果。將MongoDB的 Internal cache設置爲256M,MongoDB在一個星期內沒有發生過崩潰的情況。

      實驗配置如下:

systemLog:
 destination: file
 path: "D:\\db.log"
 logAppend: true
storage:
 dbPath: "D:\\db"
 directoryPerDB: true
 journal:
  enabled: true
 wiredTiger:
  engineConfig:
   cacheSizeGB: 0.256
net:
 bindIp: 127.0.0.1
 port: 27017
security:
 authorization: disabled
 

3  結論預測

       MongoDB的Internal cache可通過--wiredTigerCacheSizeGB或者上述配置設置Internal cache的大小。
       MongoDB使用內存的方式不是從實例一開始就向操作系統申請了好了足夠的內存空間,而是在運行的過程中,逐漸向操作系統申請內存的。例如,將Internal cache的大小設置爲3G,而mongod實例剛啓動時,它使用的內存僅有幾十M或幾百M。
       因此,MongoDB在運行過程中,會不停的向操作系統申請內存空間,直到漲到設置的最大值。然而,應用程序向操作系統申請內存是產生OOM的主要原因,存在與其它程序搶佔資源或者在操作系統安全策略下不給申請內存的風險。
       導致MongoDB崩潰的代碼如下:

類:allocator.cpp
源碼函數:
void* mongoRealloc(void* ptr, size_t size) {
    void* x = std::realloc(ptr, size);
    if (x == NULL) {
        reportOutOfMemoryErrorAndExit();
    }
    return x;
}
 

       那麼降低Internal cache的大小能避免MongoDB運行崩潰,是因爲其操作在一定程度上避免了DB向操作系統申請更多的內存,僅在原有的內存空間重新分配內存。

4  如何設置Internal cache

        如果爲了MongoDB不崩潰,把Internal cache設置在少量內存空間下運行,同樣是不合理的,畢竟MongoDB是依靠大量的內存提高計算速率的。

(1) MongoDB的wiredTiger的使用內存情況

      首先,應該先了解下MongoDB的wiredTiger的使用內存情況

      MongoDB 使用Internal cache 和 Filesystem cache兩部分內存。

      默認的,Internal cache 取以下兩個值的最大值:

50% of (RAM - 1 GB), or
256 MB.
 

       MongoDB的Filesystem cache默認會使用完所有的除Internal cache和其它進程未使用的內存。其策略取決於操作系統,儘可能的將使用過的數據緩存到Filesystem cache中。當內存不足時,使用LRU(最近最少使用算法)淘汰數據。Filesystem cache的管理其實質時操作系統的緩存管理。

(2) MongoDB的wiredTiger引擎讀取與存儲數據的過程與數據格式

        By default, WiredTiger uses Snappy block compression for all collections and prefix compression for all indexes. Compression defaults are configurable at a global level and can also be set on a per-collection and per-index basis during collection and index creation.

       Different representations are used for data in the WiredTiger internal cache versus the on-disk format:

  • Data in the filesystem cache is the same as the on-disk format, including benefits of any compression for data files. The filesystem cache is used by the operating system to reduce disk I/O.
  • Indexes loaded in the WiredTiger internal cache have a different data representation to the on-disk format, but can still take advantage of index prefix compression to reduce RAM usage. Index prefix compression deduplicates common prefixes from indexed fields.
  • Collection data in the WiredTiger internal cache is uncompressed and uses a different representation from the on-disk format. Block compression can provide significant on-disk storage savings, but data must be uncompressed to be manipulated by the server.

      Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.

(3) MongoDB的 working set

       設置Internal cache的大小,與 MongoDB 的工作集緊密相關。

      什麼是 working set

       Working set represents the total body of data that the application uses in the course of normal operation. Often this is a subset of the total data size, but the specific size of the working set depends on actual moment-to-moment use of the database.

       If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size.

       For best performance, the majority of your active set should fit in RAM.

       working set是DB所有數據的一個子集,它的大小取決於操作數據庫的行爲和時間。例如,一次查詢的的所有數據。 爲了最好的性能,大多數的常用的working set應該保留在內存中。

(4)設置Internal cache

       設置Internal cache應考慮多方面的因素。首先是MongoDB本身所在的操作系統環境,其操作環境中,是否有其它應用程序佔用內存資源,佔有後,服務器剩餘的的空間大小如何,會超過設置的Internal cache的大小嗎?然後是計算working set,使MongoDB能發揮其最好的性能。
       一般情況下,mongo所在的服務器不適合安裝過多的應用程序。設置Internal cache的大小也不宜超過mongo的默認設置( 50% of (RAM - 1 GB) )。
       如果排除了操作系統中其它程序的的影響,我認爲mongo的默認設置( 50% of (RAM - 1 GB) )是非常合理的,不用在配置上做任何修改。如果一定要指定Internal cache以特定值,應考慮上述兩種因素,使MongoDB在穩定的,高性能的狀態下運行。

5  資料參考

Mongo Storage

 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章