背景:最近在做日誌收集,由於種種原因,我們放棄了logstash,採用自己寫consumer。consumer結構很簡單,三個部分,每個部分都用Linkedblockingqueue傳遞數據
日誌接收線程—>日誌處理線程池—>日誌落地線程池
在最初的版本里,日誌處理線程池就是簡單的ExecutorService,每個線程不停的循環從隊列裏面取數據,然後業務處理,並提交給落地線程池。
// in main.java
// 線程池初始化
ExecutorService executorCon = Executors.newFixedThreadPool(5);
// 數據隊列
LinkedBlockingQueue<String> queue = new LinkedBlockingQueue<String>(20000);
List<Consumer> consumers = Lists.newArrayList();
// 產生5個消費線程並開始工作
for (int i = 0; i < 5; ++i) {
Consumer c = new Consumer(queue);
executorCon.submit(c);
consumers.add(c);
}
// in Consumer.java
// 線程運行的邏輯
public void run() {
while (running) {
try {
// 從隊列中取數
String data = queue.poll(POLL_TIME, TimeUnit.MILLISECONDS);
if (data != null) {
// 處理數據
dealData(data);
} else {
LOGGER.warn("not polled data");
}
} catch (Exception e) {
LOGGER.error("poll data exception! ", e);
}
}
// 任務結束時候的收尾工作
String data = null;
try {
while ((data = queue.poll(POLL_TIME, TimeUnit.MILLISECONDS)) != null) {
dealData(data);
}
} catch (InterruptedException e) {
LOGGER.error("", e);
}
LOGGER.info("end consumer");
}
但線上運行發現隨着日誌處理線程數量的增加,系統的吞吐並沒有增加很多。而且有時候會出現數據隊列被塞滿了的情況。於是有同學提出了一種改版,用一個線程管理,它負責從數據隊列中取數,然後將多條日誌組裝成一個任務;其他線程和以前的消費線程一樣的處理邏輯。
// in DealManager.java
private ExecutorService service = Executors.newFixedThreadPool(1);
private ExecutorService executor = Executors.newFixedThreadPool(4);
private static final int POLL_TIME = 50;
private static int BATCH_DATA_PER_TASK = 100;
private void patchTask() {
try {
if (currentTask == null) {
currentTask = new Task();
}
// 批量取日誌,然後加入到當前任務中
for (int i = 0; i < BATCH_DATA_PER_TASK; ++i) {
String data = queue.poll(POLL_TIME, TimeUnit.MILLISECONDS);
if (data != null) {
currentTask.addData(data);
} else {
LOGGER.warn("not polled data");
break;
}
}
// 如果湊齊“一批”日誌,那麼認爲任務填充完畢,交給線程池處理
if (currentTask.getDataCount() >= 100) {
executor.submit(currentTask);
currentTask = null;
}
} catch (InterruptedException e) {
LOGGER.error("", e);
}
}
public void run() {
while (running) {
patchTask();
}
}
每一個task都是一個runnable的任務,由DealManager裏面的線程池去執行它們
// in Task.java
public static class Task implements Runnable {
private List<String> datas = Lists.newArrayList();
private StringBuffer stringBuffer = new StringBuffer(512);
public void addData(String data) {
datas.add(data);
}
public int getDataCount() {
return datas.size();
}
public void run() {
for (String data : datas) {
// 沒有實際含義,只是爲了做一些操作,模擬業務上對日誌的處理
// 這裏的處理和前面日誌處理線程中的 dealData 函數中一樣
Random ran = new Random(System.currentTimeMillis());
for (int count = 0; count < 50; ++count) {
for (int i = 0; i < 10; ++i) {
String part0 = data.substring(i * 10, i * 10 + 10);
for (int j = 0; j < part0.length(); ++j) {
int seed = 26 - (part0.charAt(j) - '0');
stringBuffer.append(ran.nextInt(seed));
}
stringBuffer.delete(0, stringBuffer.length());
}
}
}
LOGGER.info("end deal task");
}
}
爲了模擬真實環境,寫了一個產生隨機文本的生產者
// in Productor.java
public class Productor implements Callable<Boolean> {
private static final Logger LOGGER = LoggerFactory.getLogger(Productor.class);
private static final int STRING_SIZE = 512;
private static final int SLEEP_TIME = 15;
private StringBuilder stringBuilder;
private LinkedBlockingQueue<String> queue;
public Productor(LinkedBlockingQueue<String> queue) {
this.queue = queue;
this.stringBuilder = new StringBuilder(STRING_SIZE);
}
public void run() {
}
public Boolean call() throws Exception {
LOGGER.info("start productor");
for (int count = 0 ; count < 100000; ++count) {
// 產生長度固定爲512的隨機文本,作爲一行日誌
Random ran = new Random(System.currentTimeMillis());
for (int i = 0; i < STRING_SIZE; ++i) {
stringBuilder.append(ran.nextInt());
}
try {
// 嘗試塞入數據隊列中
if (queue.offer(stringBuilder.toString(), 20, TimeUnit.MILLISECONDS)) {
} else {
LOGGER.warn("queue is full!");
Thread.sleep(SLEEP_TIME);
}
} catch (Exception e) {
LOGGER.error("insert string failed! ", e);
}
stringBuilder.delete(0, stringBuilder.length());
}
LOGGER.info("end productor");
// 生產完畢返回上層,以便上層通知消費線程停止消費
return true;
}
}
經過實驗,在相同的生產者,數據隊列大小(2w)和消費線程數下(5),生產者連續產生10w條隨機的文本,第二種方案的吞吐比第一種高,實驗數據如下
一個生產者,一個管理,4個消費
18:35:00.258 [pool-1-thread-1] INFO com.baidu.xyb.Producer - start productor
18:35:19.634 [pool-1-thread-1] INFO com.baidu.xyb.Producer - end productor
18:35:19.635 [main] INFO com.baidu.xyb.DealManager - stop deal-manager
18:35:25.204 [main] INFO com.baidu.xyb.DealManager - stop poll
18:35:28.191 [main] INFO com.baidu.xyb.DealManager - end deal-manager
cost:28207
一個生產者,5個消費
18:33:49.017 [pool-1-thread-1] INFO com.baidu.xyb.Producer - start productor
18:34:03.507 [pool-1-thread-1] WARN com.baidu.xyb.Producer - queue is full!
18:34:12.398 [pool-1-thread-1] WARN com.baidu.xyb.Producer - queue is full!
18:34:14.123 [pool-1-thread-1] WARN com.baidu.xyb.Producer - queue is full!
18:34:20.447 [pool-1-thread-1] WARN com.baidu.xyb.Producer - queue is full!
18:34:23.318 [pool-1-thread-1] INFO com.baidu.xyb.Producer - end productor
18:34:23.318 [main] INFO com.baidu.xyb.Consumer - stop consumer
18:34:23.319 [main] INFO com.baidu.xyb.Consumer - stop consumer
18:34:23.319 [main] INFO com.baidu.xyb.Consumer - stop consumer
18:34:23.319 [main] INFO com.baidu.xyb.Consumer - stop consumer
18:34:23.319 [main] INFO com.baidu.xyb.Consumer - stop consumer
18:34:29.283 [pool-2-thread-3] INFO com.baidu.xyb.Consumer - end consumer
18:34:29.290 [pool-2-thread-5] INFO com.baidu.xyb.Consumer - end consumer
18:34:29.307 [pool-2-thread-4] INFO com.baidu.xyb.Consumer - end consumer
18:34:29.292 [pool-2-thread-1] INFO com.baidu.xyb.Consumer - end consumer
18:34:29.317 [pool-2-thread-2] INFO com.baidu.xyb.Consumer - end consumer
cost:40559
針對第二種方案,還可以通過預先準備好task,不斷複用來進一步優化性能。