Netty發送隊列積壓案例
環境配置
//vm參數設置
-Xmx1000m -XX:+PrintGC -XX:+PrintGCDetails
對業務性能壓測,N個客戶端併發訪問服務端,客戶端基於Netty框架做網絡通信,壓測一段時間之後,響應時間越來越長,失敗率增加,監控客戶端內存使用情況,發現使用的內存一直飆升,吞吐量慢慢降爲0,最後發生OOM異常,CPU佔用居高不下,GC佔滿CPU。
高併發故障場景模擬
客戶端內部創建一個線程,向服務端循環發送請求,模擬客戶端高併發場景。
服務端代碼入下:
/**
* Created by lijianzhen1 on 2019/1/24.
*/
public final class LoadRunnerServer {
static final int PORT = Integer.parseInt(System.getProperty("port", "8080"));
public static void main(String[] args) throws Exception {
EventLoopGroup bossGroup = new NioEventLoopGroup(1);
EventLoopGroup workerGroup = new NioEventLoopGroup();
try {
ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 100)
.handler(new LoggingHandler(LogLevel.INFO))
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline p = ch.pipeline();
p.addLast(new EchoServerHandler());
}
});
ChannelFuture f = b.bind(PORT).sync();
f.channel().closeFuture().sync();
} finally {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}
}
}
class EchoServerHandler extends ChannelInboundHandlerAdapter {
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ctx.write(msg);
}
@Override
public void channelReadComplete(ChannelHandlerContext ctx) {
ctx.flush();
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
// 發生異常關閉連接
cause.printStackTrace();
ctx.close();
}
}
客戶端代碼如下
public class LoadRunnerClient {
static final String HOST = System.getProperty("host", "127.0.0.1");
static final int PORT = Integer.parseInt(System.getProperty("port", "8080"));
@SuppressWarnings({"unchecked", "deprecation"})
public static void main(String[] args) throws Exception {
EventLoopGroup group = new NioEventLoopGroup();
try {
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
//設置請求的高水位
.option(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 10 * 1024 * 1024)
.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline p = ch.pipeline();
p.addLast(new LoadRunnerClientHandler());
}
});
ChannelFuture f = b.connect(HOST, PORT).sync();
f.channel().closeFuture().sync();
} finally {
group.shutdownGracefully();
}
}
}
public class LoadRunnerClientHandler extends ChannelInboundHandlerAdapter {
private final ByteBuf firstMessage;
Runnable loadRunner;
AtomicLong sendSum = new AtomicLong(0);
Runnable profileMonitor;
static final int SIZE = Integer.parseInt(System.getProperty("size", "256"));
/**
* 創建客戶端的handler
*/
public LoadRunnerClientHandler() {
firstMessage = Unpooled.buffer(SIZE);
for (int i = 0; i < firstMessage.capacity(); i ++) {
firstMessage.writeByte((byte) i);
}
}
@Override
public void channelActive(final ChannelHandlerContext ctx) {
loadRunner = new Runnable() {
@Override
public void run() {
try {
TimeUnit.SECONDS.sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
ByteBuf msg = null;
final int len = "Netty OOM Example".getBytes().length;
while(true)
{
msg = Unpooled.wrappedBuffer("Netty OOM Example".getBytes());
ctx.writeAndFlush(msg);
}
}
};
new Thread(loadRunner, "LoadRunner-Thread").start();
}
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg)
{
ReferenceCountUtil.release(msg);
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
cause.printStackTrace();
ctx.close();
}
}
創建鏈路之後,客戶端啓動一個線程向服務端循環發送請求消息,模擬客戶端壓測場景,系統運行一段時間後,發現內存佔用率飆升。
發現老年代已滿,系統已經無法運行
FullGC次數增多,每次GC時間變長
CPU使用情況,發現GC佔用了大量的CPU資源。最後發生OOM。
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "LoadRunner-Thread"
[PSYoungGen: 134602K->134602K(227328K)] [ParOldGen: 682881K->682880K(683008K)] 817483K->817483K(910336K), [Metaspace: 12452K->12452K(1060864K)], 1.8986145 secs] [Times: user=11.42 sys=0.06, real=1.90 secs]
使用MAT分析內存泄漏Dump文件
Dump客戶端導出的文件heapdump-1548325358543.hprof 進行分析,發小Netty的NioEventLoop佔用了99.75%,的內存,可以確認NioEventLoop發生了內存泄漏。
對引用關係進行分析,發現真正泄漏對象是WriteAndFlushTask,它包含了待發送的客戶端請求消息msg及promise對象,如下圖:
Netty的消息發送隊列積壓,通過源碼看看,調用Channel的write方法時,如果發送方爲業務線程,則將發送操作封裝成WriteAndFlushTask,放到Netty的NioEventLoop中執行。
//AbstractChannelHandlerContext.write(Object msg, boolean flush, ChannelPromise promise)
private void write(Object msg, boolean flush, ChannelPromise promise) {
AbstractChannelHandlerContext next = findContextOutbound();
final Object m = pipeline.touch(msg, next);
EventExecutor executor = next.executor();
if (executor.inEventLoop()) {//判斷當前線程是否是EventLoop線程
if (flush) {
next.invokeWriteAndFlush(m, promise);
} else {
next.invokeWrite(m, promise);
}
} else {
//主要是在這裏
AbstractWriteTask task;
//這裏傳遞古來true
if (flush) {
//WriteAndFlushTask構建
task = WriteAndFlushTask.newInstance(next, m, promise);
} else {
task = WriteTask.newInstance(next, m, promise);
}
safeExecute(executor, task, promise, m);
}
}
當消息太多了後,Netty的I/O線程NioEventLoop無法完成, 任務隊列都轉移到了任務隊列中,這樣任務隊列積壓導致內存泄漏。
如何防止發送隊列積壓
爲了防止在高併發場景下,由於服務端處理慢導致客戶端消息積壓,除了服務端做流量控制,客戶端也需要做併發保護,防止自身發生消息積壓。
利用Netty提供的高低水位機制,可以實現客戶端更精準的流控。調整客戶端代碼。
public class LoadRunnerClientHandler extends ChannelInboundHandlerAdapter {
private final ByteBuf firstMessage;
Runnable loadRunner;
AtomicLong sendSum = new AtomicLong(0);
Runnable profileMonitor;
static final int SIZE = Integer.parseInt(System.getProperty("size", "256"));
/**
* Creates a client-side handler.
*/
public LoadRunnerClientHandler() {
firstMessage = Unpooled.buffer(SIZE);
for (int i = 0; i < firstMessage.capacity(); i++) {
firstMessage.writeByte((byte) i);
}
}
@Override
public void channelActive(final ChannelHandlerContext ctx) {
//這裏限制高水位
ctx.channel().config().setWriteBufferHighWaterMark(10 * 1024 * 1024);
loadRunner = new Runnable() {
@Override
public void run() {
try {
TimeUnit.SECONDS.sleep(30);
} catch (InterruptedException e) {
e.printStackTrace();
}
ByteBuf msg = null;
final int len = "Netty OOM Example".getBytes().length;
while (true) {
//判斷是否越過高水位
if (ctx.channel().isWritable()) {
msg = Unpooled.wrappedBuffer("Netty OOM Example".getBytes());
//這裏會有問題
ctx.writeAndFlush(msg);
} else {
System.out.println("寫入隊列已經滿了對應buffer的大小 :" + ctx.channel().unsafe().outboundBuffer().nioBufferCount());
}
}
}
};
new Thread(loadRunner, "LoadRunner-Thread").start();
}
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ReferenceCountUtil.release(msg);
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
cause.printStackTrace();
ctx.close();
}
}
調整後運行正常,內存和CPU使用情況正常,內存泄漏問題解決。
修改代碼當發送隊列待發送的字節數組達到高水位時,對應的Channel就變爲不可寫狀態,由於高水位並不影響業務線程調用write方法並把消息加入待發隊列,因此,必須在消息發送時對Channel的狀態進行判斷,當到達高水位時,Channel的狀態被設置爲不可寫,通過對Channel的可寫狀態進行判斷決定是否發送消息。
問題總結
在實際項目中,根據業務QPS規劃,客戶端處理性能,網絡帶寬,鏈路數,消息平均碼流大小等綜合因素計算並設置高水位值,利用高水位做消息發送速度的流控,既可以保護自己,也可以減輕服務端的壓力,防止服務端被壓宕機。