記錄一次OOM排查經歷

我是用了netty搭建了一個UDP接收日誌,堆啓動配置 Xmx256  Xms256 ,項目剛啓動的時候,系統進程佔用內存很正常,在250M左右。

長時間運行之後發現,進程佔用內存不斷增長,遠遠超過了我設置的堆內存大小,查看倖存者,伊甸園,老年代,gc都很正常,堆使用數據一切正常,甚至我懷疑元空間佔用內存大,查詢之後發現,元空間也只用很小,而且自從程序啓動開始,浮動很小。爲此,我又把JVM相關知識點又拿出來翻了一遍

 

那麼多出來的內存使用是從哪裏來的?

後來通過查詢相關資料才發現,Java進程內存分爲堆內存,堆外內存,堆外內存是不受JVM的GC管理的。

 

堆外內存又是哪裏使用到的?

nio框架會使用到

 

難道netty沒有自己的一套GC機制?

有的,但是netty的GC,只負責釋放自己產生的內存,如果是使用過程中,自己創建的,是不在netty GC的範圍內的。好,那麼現在穩定定位到了,開始修改代碼和程序啓動參數。

java -jar -Xms256M -Xmx256M -XX:MaxDirectMemorySize=128M -Dspring.profiles.active=prod log-server.jar

-XX:MaxDirectMemorySize=128M 設置堆外內存爲128M,來控制進程內存使用,並且在代碼中手動 copy 出來的  ByteBuf 進行  clear (PS:後來發現這個操作不起效果,是我對於該方法的理解有誤)

 

@Component
public class UDPInboundHandler extends SimpleChannelInboundHandler<DatagramPacket> {

    private Logger logger = LoggerFactory.getLogger(UDPInboundHandler.class);

    @Autowired
    LogService logService;

    @Override
    protected void channelRead0(ChannelHandlerContext ctx, DatagramPacket packet) {
        String remoteAddr = packet.sender().getAddress().getHostAddress();
        ByteBuf buf = packet.copy().content();
        logService.process(buf, remoteAddr);
        buf.clear();
    }
}

 

這樣運行一點時間後,嗯,內存增長速度慢下來不少,原本從兩百兆漲到五百兆,只需要半天時間,現在,一天觀察下來,才增長到四百多兆,但是,256+128=384M,也超過了我設置的堆內存+堆外內存的總和,而且代碼開始報錯了如下:

io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 134217728, max: 134217728)

134217728 byte(s) = 128M 也就是說我的clear的操作並沒有效果,堆外內存已經全部用光。OOM的報錯已經刷屏,但是,在衆多的異常日誌中發現了這條日誌

 

2019-09-25 18:20:00.551 {nioEventLoopGroup-2-1} ERROR io.netty.util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
Created at:
    io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
    io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
    io.netty.buffer.UnsafeByteBufUtil.copy(UnsafeByteBufUtil.java:436)
    io.netty.buffer.PooledUnsafeDirectByteBuf.copy(PooledUnsafeDirectByteBuf.java:309)
    io.netty.buffer.AbstractByteBuf.copy(AbstractByteBuf.java:1190)
    io.netty.buffer.WrappedByteBuf.copy(WrappedByteBuf.java:874)
    io.netty.channel.socket.DatagramPacket.copy(DatagramPacket.java:47)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:24)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:13)
    io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    java.lang.Thread.run(Thread.java:748)

 

ByteBuf 沒有調用 release 方法,由於我的代碼量比較小,項目中只有一處是用到了 ByteBuf ,所以我很快定位到了問題代碼,但是如果項目很大,不知道是哪段代碼導致的問題,怎麼排查呢?查詢相關資料後,我們再修改一下啓動參數

java -jar -Xms256M -Xmx256M -XX:MaxDirectMemorySize=2M -Dio.netty.leakDetection.level=advanced -Dio.netty.leakDetection.maxRecords=10 -Dspring.profiles.active=prod log-server.jar

果不其然,代碼又報錯了,這次報錯的信息很詳細,已經定位到是哪個ByteBuf 變量了

2019-09-27 10:54:24.442 {nioEventLoopGroup-2-1} ERROR io.netty.util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
#1:
    io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:496)
    com.tutorgroup.base.logserver.service.LogService.getLogJSONArray(LogService.java:108)
    com.tutorgroup.base.logserver.service.LogService.process(LogService.java:51)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:25)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:13)
    io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    java.lang.Thread.run(Thread.java:748)
Created at:
    io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
    io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
    io.netty.buffer.UnsafeByteBufUtil.copy(UnsafeByteBufUtil.java:436)
    io.netty.buffer.UnpooledUnsafeDirectByteBuf.copy(UnpooledUnsafeDirectByteBuf.java:463)
    io.netty.buffer.AbstractByteBuf.copy(AbstractByteBuf.java:1190)
    io.netty.channel.socket.DatagramPacket.copy(DatagramPacket.java:47)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:24)
    com.tutorgroup.base.logserver.server.UDPInboundHandler.channelRead0(UDPInboundHandler.java:13)
    io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:93)
    io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
    io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
    io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    java.lang.Thread.run(Thread.java:748)

 

 查閱相關資料,釋放ByteBuf 方式,修改代碼如下

@Component
public class UDPInboundHandler extends SimpleChannelInboundHandler<DatagramPacket> {

    private Logger logger = LoggerFactory.getLogger(UDPInboundHandler.class);

    @Autowired
    LogService logService;

    @Override
    protected void channelRead0(ChannelHandlerContext ctx, DatagramPacket packet) {
        String remoteAddr = packet.sender().getAddress().getHostAddress();
        ByteBuf buf = packet.copy().content();
        try{
            logService.process(buf, remoteAddr);
            buf.clear();
        }catch (Exception e){
            logger.error(e.getMessage(),e);
        }
        finally {
            ReferenceCountUtil.release(buf);
        }
    }
}

 

 

ReferenceCountUtil.release()  是netty釋放堆外內存的方法,加上這行代碼後,問題完美解決。

 

參考資料:

http://static.muyus.com/html/3.html

https://www.jianshu.com/p/17e72bb01bf1

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章