1-前言
本文根據實際遇到的線程池使用導致的性能問題,從代碼層面解析 線程池 核心線程數、最大線程數、工作隊列三個參數配置不佳容易產生的問題,以及對這些問題的建議
對線程池的更多解析,這篇文章講得已經比較詳細了,建議大家仔細研讀:《阿里規約---爲什麼建議使用ThreadPoolExecutor自定義線程池_阿里線程池規範》(因爲某些原因這裏不貼鏈接,請大家自行搜索)
2-概述
2.1 快速結論
- corePoolSize指定核心線程數,核心線程在首次使用時創建,創建後不被銷燬
- maxPoolSize線程上限數,只有工作隊列已滿+線程池線程未達到線程上限數時,纔會創建擴容線程,且擴容線程會在閒置一段時間後銷燬
- 重要推論:工作隊列未滿時,不會創建擴容線程——工作隊列過大時,核心線程數就決定了系統吞吐量!!!
負面案例:《單個請求發起過多異步任務的性能問題》(因爲某些原因,這裏不便貼出原文鏈接)
2.2 建議
- 對IO密集型+高併發場景,建議核心線程可配置,且配置數量足夠支撐性能要求;
作爲兜底的 maxPoolSize最大線程池 與 workQueue工作隊列,建議二者選其一:
- 及時處理策略:當本服務CPU、內存資源充足,且底層服務不是瓶頸時,maxPoolSize設置得夠很大、workQueue很小,以增加本服務吞吐量,但要謹防OOM、底層服務雪崩風險;
- 平谷削峯策略:當本服務CPU、內存資源存在瓶頸,如計算密集型或內存密集型業務,或底層服務存在瓶頸時,workQueue配置得足夠大,maxPoolSize=corePoolSize,減少服務OOM、底層服務雪崩風險。
3-原理解析
3.1 代碼分析
java.util.concurrent.ThreadPoolExecutor.execute的代碼
public void execute(Runnable command) { if (command == null) throw new NullPointerException(); /* * Proceed in 3 steps: * * 1. If fewer than corePoolSize threads are running, try to * start a new thread with the given command as its first * task. The call to addWorker atomically checks runState and * workerCount, and so prevents false alarms that would add * threads when it shouldn't, by returning false. * * 2. If a task can be successfully queued, then we still need * to double-check whether we should have added a thread * (because existing ones died since last checking) or that * the pool shut down since entry into this method. So we * recheck state and if necessary roll back the enqueuing if * stopped, or start a new thread if there are none. * * 3. If we cannot queue task, then we try to add a new * thread. If it fails, we know we are shut down or saturated * and so reject the task. */ int c = ctl.get(); if (workerCountOf(c) < corePoolSize) { if (addWorker(command, true)) return; c = ctl.get(); } if (isRunning(c) && workQueue.offer(command)) { int recheck = ctl.get(); if (! isRunning(recheck) && remove(command)) reject(command); else if (workerCountOf(recheck) == 0) addWorker(null, false); } else if (!addWorker(command, false)) reject(command); }
解讀:
- 提交一個異步任務時,如果 可用線程數<核心線程數,則直接安排執行(addWorker(command, true)),安排成功則正常返回;
- 否則(核心線程全忙,或安排執行失敗),將任務加入工作隊列(workQueue.offer);
- 否則(前兩步都失敗),則嘗試以非核心線程模式安排執行(addWorker(command, false)。
補充說明:
addWorker代碼的核心邏輯是當線程池的可用線程沒到上限時,創建一個線程去執行任務,出於線程安全考慮,代碼比較複雜,這裏不貼出,有興趣的同學可以移步JDK源碼中查看;
workQueue中的任務,則由閒置線程循環拉取執行(ThreadPoolExecutor.runWorker()),從而達到了線程複用的效果。
總結:
1. 核心線程首次使用時創建(除非調用了prestartAllCoreThreads方法)
2. 核心線程處理不了的任務,往工作隊列扔
3. 工作隊列滿了,創建非核心線程
4. 工作隊列和最大線程都達到上限,則拒絕任務
3.2 測試驗證
3.2.1 任務提交速率超過了核心線程的總處理能力,只要工作隊列沒有撐滿,是不會創建更多線程的:
public class Main { public static void main(String[] args) throws InterruptedException { // 負載參數:總任務數,任務提交速率(每秒提交任務數) final int totalTaskCount = 500, submitQPS = 100; // 工作能力參數:線程任務處理速率(線程每秒能處理的任務數),工作隊列大小,核心線程數,最大線程數 final int threadQPS = 20, workQueueSize = 100, corePoolSize = 4, maxPoolSize = 10; // 初始化線程池 ThreadPoolExecutor executor = new ThreadPoolExecutor(corePoolSize, maxPoolSize, 100, TimeUnit.SECONDS, new ArrayBlockingQueue<>(workQueueSize)); for (int i = 0; i < totalTaskCount; i++) { final int j = i; executor.execute(() -> { System.out.printf("[%s]%03d:%s\n", LocalDateTime.now(), j, Thread.currentThread().getName()); try { // 每個任務耗時=單位時間/線程任務處理速率 Thread.sleep(1000 / threadQPS); } catch (InterruptedException e) { throw new RuntimeException(e); } }); // 任務提交間隔=單位時間/任務提交速率 Thread.sleep(1000 / submitQPS); } } }
執行結果:
[2024-04-10T10:15:50.061206]000:pool-1-thread-1
[2024-04-10T10:15:50.064610]003:pool-1-thread-4
[2024-04-10T10:15:50.064628]001:pool-1-thread-2
[2024-04-10T10:15:50.061321]002:pool-1-thread-3
[2024-04-10T10:15:50.217581]004:pool-1-thread-1
[2024-04-10T10:15:50.219693]005:pool-1-thread-4
[2024-04-10T10:15:50.219703]006:pool-1-thread-2
[2024-04-10T10:15:50.220365]007:pool-1-thread-3
...
[2024-04-10T10:16:03.040089]994:pool-1-thread-2
[2024-04-10T10:16:03.040089]993:pool-1-thread-1
[2024-04-10T10:16:03.040089]992:pool-1-thread-4
[2024-04-10T10:16:03.040619]995:pool-1-thread-3
[2024-04-10T10:16:03.091222]996:pool-1-thread-4
[2024-04-10T10:16:03.091223]997:pool-1-thread-2
[2024-04-10T10:16:03.091386]998:pool-1-thread-1
[2024-04-10T10:16:03.091745]999:pool-1-thread-3
結果解讀:
雖然任務提交速率>線程池核心線程吞吐量,但是由於工作隊列在整個測試期間沒有滿,所以一直只有4個線程在處理任務。
每秒工作隊列多出的任務數(填充速率)=任務提交速率-線程處理速率*線程數=100-20*4=20
填滿工作隊列所需時間=工作隊列大小/填充速率=200/20=10秒
10秒提交任務數=任務提交速率*時長=100*10=1000<=1000=總任務數
線程池10秒內處理的任務數,剛好卡在工作隊列溢出的臨界值,所以沒有創建新線程。
因爲Thread.sleep不精確,所以測試結果可能會有所不同,工作隊列有可能在最後溢出,最後幾個任務可能會創建新線程。
3.2.2 如果任務提交速率超過了核心線程的處理能力,只要持續的時間夠長,工作隊列總是會滿,則會觸發非核心線程的創建:
// 負載參數:總任務數,任務提交速率(每秒提交任務數) final int totalTaskCount = 2000, submitQPS = 100; // 工作能力參數:線程任務處理速率(線程每秒能處理的任務數),工作隊列大小,核心線程數,最大線程數 final int threadQPS = 20, workQueueSize = 100, corePoolSize = 4, maxPoolSize = 10;
執行結果:
[2024-04-10T10:30:32.684693]002:pool-1-thread-3
[2024-04-10T10:30:32.685077]003:pool-1-thread-4
[2024-04-10T10:30:32.699185]000:pool-1-thread-1
[2024-04-10T10:30:32.696473]001:pool-1-thread-2
[2024-04-10T10:30:32.815452]005:pool-1-thread-2
[2024-04-10T10:30:32.815300]004:pool-1-thread-3
[2024-04-10T10:30:32.815625]006:pool-1-thread-1
[2024-04-10T10:30:32.816781]007:pool-1-thread-4
...
[2024-04-10T10:30:44.772992]1124:pool-1-thread-5
...
[2024-04-10T10:30:55.698278]1992:pool-1-thread-2
[2024-04-10T10:30:55.698279]1993:pool-1-thread-4
[2024-04-10T10:30:55.749245]1994:pool-1-thread-3
[2024-04-10T10:30:55.749408]1998:pool-1-thread-2
[2024-04-10T10:30:55.749388]1997:pool-1-thread-1
[2024-04-10T10:30:55.749347]1996:pool-1-thread-4
[2024-04-10T10:30:55.749270]1995:pool-1-thread-5
[2024-04-10T10:30:55.799817]1999:pool-1-thread-5
結果解讀:
(大約)第10秒工作隊列溢出,所以創建了一個新的線程來處理更多任務
由於5個線程的任務處理速率=任務提交速率,所以直到測試結束都沒有創建更多線程
因爲Thread.sleep不精確,所以測試結果可能會有所不同,可能會多出第6個線程。
減少工作隊列大小、或加大任務提交速率,也能測試出類似的結果,有興趣的同學可以自己試試
3.2.3 任務提交速率超過了最大線程數的處理能力,持續一段時間等隊列塞滿後,線程池則會拒絕新任務的提交:
// 負載參數:總任務數,任務提交速率(每秒提交任務數) final int totalTaskCount = 1000, submitQPS = 125; // 工作能力參數:線程任務處理速率(線程每秒能處理的任務數),工作隊列大小,核心線程數,最大線程數 final int threadQPS = 10, workQueueSize = 20, corePoolSize = 4, maxPoolSize = 10;
執行結果:
...
[2024-04-10T10:45:29.800465]026:pool-1-thread-7
[2024-04-10T10:45:29.784969]025:pool-1-thread-6
[2024-04-10T10:45:29.768852]024:pool-1-thread-5
[2024-04-10T10:45:29.718754]002:pool-1-thread-3
[2024-04-10T10:45:29.720490]001:pool-1-thread-2
[2024-04-10T10:45:29.726577]003:pool-1-thread-4
[2024-04-10T10:45:29.823012]028:pool-1-thread-9
[2024-04-10T10:45:29.836052]029:pool-1-thread-10
Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task org.ctstudio.Main$$Lambda$15/0x0000000800c01208@6b884d57 rejected from java.util.concurrent.ThreadPoolExecutor@38af3868[Running, pool size = 10, active threads = 10, queued tasks = 20, completed tasks = 0]
at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2065)
at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1365)
at org.ctstudio.Main.main(Main.java:21)
[2024-04-10T10:45:29.914904]005:pool-1-thread-6
[2024-04-10T10:45:29.915111]007:pool-1-thread-8
[2024-04-10T10:45:29.915061]006:pool-1-thread-7
[2024-04-10T10:45:29.915907]008:pool-1-thread-5
[2024-04-10T10:45:29.915613]004:pool-1-thread-1
...
結果解讀:任務提交速率(125)>線程池最大吞吐量(10*10)時,工作隊列遲早會滿,滿的那一刻會拒絕更多任務。
以上我們通過代碼驗證了線程池的三種狀態,有興趣的同學還可以在此代碼的基礎之上,驗證更多場景。