Hbase 之 There is insufficient memory

原創

2021-01-30 10:16

再不點藍字關注，機會就要飛走了哦！

1. 現象

線下有一個5個HRegionServer節點的小Hbase集羣（小內存），昨日全掛掉了。重啓，啓動不了。

2. 原因

查看錯誤日誌，發現出發點是因爲GC超時導致。
首先是其中某一個節點發生GC，連接ZK超時，連接關閉而掛掉。
掛掉後發生了Region遷移，如下（可右測滑動）：

Adding moved region record: 41810a8da6ae1fa59a732d24a521d55c to yq-hadoop***,60020,1520389720296 as of 405597
Adding moved region record: 9cfe6192b4c7074699177ff2a5b72e70 to yq-hadoop***,60020,1520389720296 as of 1605

Region 遷移之後，其他節點負載增加，GC更加嚴重，相繼掛掉。

我們知道，hbase 是比較耗內存的，小內存是其軟肋。其實除了併發有點高之外，主要原因還是因爲測試集羣機身內存太小了（只有7GB），加之其他應用，HBASE 堆棧無法分配過多內存，導致GC嚴重。

3. 重啓失敗

報錯如下（可右滑）：


There is insufficient memory for the Java Runtime Environment to continue. 
Native memory allocation (malloc) failed to allocate 715784192 bytes for committing reserved memory.

出現上述錯誤常見原因有三：

1. 確實是機身內存不足以分配相應內存
2. 某進程啓動線程過多，可能是代碼哪裏有問題
3. ulimit -n 過小

本文正是因爲第一個原因。

解決辦法：

a. 停掉此節點無用的應用，釋放更多內存。

b. 調小Hbase Region Server 堆棧內存。如下：

c. 重啓即可。

4. 預警信息

當發現日誌中出現如下提示信息（flush 延遲），一般是內存分配不足了，要早做處理（可右滑）。

2018-03-07 10:20:38,091 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,6\x00\x00\x00\x00\x00\x00\x00,1499652593538.3e9884dcb01d4a98408919fbc3433f3f. because S has an old edit so flush to free WALs after random delay 33552ms
2018-03-07 10:20:38,091 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,\xB4\x00\x00\x00\x00\x00\x00\x00,1499652593538.f7538a5b9b88fa60857a9f38f272a436. because S has an old edit so flush to free WALs after random delay 242347ms
2018-03-07 10:20:38,091 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,{\x00\x00\x00\x00\x00\x00\x00,1499652593538.fb9d550b69cc3409d5256718c47d871d. because S has an old edit so flush to free WALs after random delay 118059ms
2018-03-07 10:20:38,091 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,;\x00\x00\x00\x00\x00\x00\x00,1499652593538.3c226c0465be2a64d7c0fa9e25e41a7e. because S has an old edit so flush to free WALs after random delay 135722ms
2018-03-07 10:20:38,092 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,\xA8\x00\x00\x00\x00\x00\x00\x00,1499652593538.90c4ebec39bb67680dbe4f17c662d229. because S has an old edit so flush to free WALs after random delay 287717ms
2018-03-07 10:20:38,092 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,<\x00\x00\x00\x00\x00\x00\x00,1499652593538.2bd2b43e949edd374878fe6b31f79c15. because S has an old edit so flush to free WALs after random delay 280345ms
2018-03-07 10:20:38,092 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,\x88\x00\x00\x00\x00\x00\x00\x00,1499652593538.e00b1e90a58c0f8d78fe4727d87c8949. because S has an old edit so flush to free WALs after random delay 150252ms
2018-03-07 10:20:38,092 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: yq-hadoop****,60020,1520385422866-MemstoreFlusherChore requesting flush of TraceV2,\xF2\x00\x00\x00\x00\x00\x00\x00,1499652593538.d834224abc9d6b88b93911ea1fa17a0e. because S has an old edit so flush to free WALs after random delay 230071ms

想對你說：“

昨天越來越多，明天越來越少，這就叫人生。

你之所以覺得時間一年比一年過得快，

是因爲時間對你一年比一年重要。

別因爲害怕孤單而湊合着相擁，

也別因爲一時的別無選擇而將就的活着，

總要有一段路，需要你獨自走過。

願你是陽光，明媚不憂傷。”

第一個專注Hbase公衆號

雖沒官方認證

但這一定是個負責任的公衆號

將最好的祝福送給正在閱讀的你，感恩！

本文分享自微信公衆號 - HBase工作筆記（HBase-Notes）。
如有侵權，請聯繫 [email protected] 刪除。
本文參與“OSC源創計劃”，歡迎正在閱讀的你也加入，一起分享。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hbase 之 There is insufficient memory

1. 現象

2. 原因

我們知道，hbase 是比較耗內存的，小內存是其軟肋。其實除了併發有點高之外，主要原因還是因爲測試集羣機身內存太小了（只有7GB），加之其他應用，HBASE 堆棧無法分配過多內存，導致GC嚴重。

3. 重啓失敗

出現上述錯誤常見原因有三：

解決辦法：

4. 預警信息

elk3

Python 將PDF轉爲PDF/A、PDF/X，以及PDF/A轉回PDF

號稱能打敗MLP的KAN到底行不行？數學核心原理全面解析

同事使用 insert into select 遷移數據，開開心心上線，上線後被公司開除！

DeepFilterNet復現

以全要素數據資產連接爲核心的數據治理與運營

LoRA微調語言大模型的實用技巧與實踐

大模型微調方法總結：LoRA, Adapter, Prefix-tuning, P-tuning, Prompt-tuning

HBase Meta 元信息表修復實踐

sqlserver MERGE 異常

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結