昨天在查看項目時,偶爾看到代碼裏通過靜態代碼塊實現的單例模式的線程池,如下:
public static final ThreadPoolExecutor threadPool;
static {
int nCpu = Runtime.getRuntime().availableProcessors();
int maxPoolSize = (2 * nCpu) + 1;
threadPool = new ThreadPoolExecutor(3, maxPoolSize, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
}
注意:這裏通過靜態代碼塊實現,其實是餓漢單例模式的變種,並不是懶漢模式。
看到上面的參數配置,回想起去年解決的一個OOM的問題。
先不說造成問題的具體原因,說下線程池的各個參數配置。
基本參數
-
corePoolSize:核心線程數
-
maxPoolSize:最大線程數
-
keepAliveTime:線程空閒時間
-
unit:時間單位
-
workQueue:任務隊列
-
threadFactory:線程工廠
阿里 JAVA開發手冊
【強制】創建線程或線程池時請指定有意義的線程名稱,方便出錯時回溯。 -
handler:拒絕策略
執行順序
在創建了線程池後,默認情況下,線程池中並沒有任何線程,而是等待有任務到來才創建線程去執行任務。除非通過調用prestartAllCoreThreads()或者prestartCoreThread()方法,來預創建線程。
當線程空閒時間達到keepAliveTime,該線程會退出,直到線程數量等於corePoolSize。如果allowCoreThreadTimeout設置爲true,則所有線程均會退出直到線程數量爲0。
- 當線程數小於核心線程數時,創建線程。
- 當線程數大於等於核心線程數,且任務隊列未滿時,將任務放入任務隊列。
- 當線程數大於等於核心線程數,且任務隊列已滿
- 若線程數小於最大線程數,創建線程
- 若線程數等於最大線程數,根據拒絕策略執行,四種拒絕策略如下:
- ThreadPoolExecutor.AbortPolicy();//默認策略,拋出運行時異常 RejectedExecutionException
- ThreadPoolExecutor.CallerRunsPolicy();//由主線程執行該任務 隊列滿了丟任務不異常
- ThreadPoolExecutor.CallerRunsPolicy();//直接丟棄任務
- ThreadPoolExecutor.DiscardOldestPolicy();//將最早進入隊列的任務刪,之後再嘗試加入隊列
設定線程數量
如果是CPU密集型應用,則線程池大小設置爲cpu核數或者cpu核數±1
如果是IO密集型應用,則線程池大小設置爲((線程等待時間+線程cpu時間)/線程cpu時間*cpu數目)
設置後可以通過壓測驗證數量是否合適。
OOM問題
阿里 JAVA開發手冊
【強制】線程池不允許使用 Executors 去創建,而是通過 ThreadPoolExecutor 的方式,這
樣的處理方式讓寫的同學更加明確線程池的運行規則,規避資源耗盡的風險。
說明:Executors 返回的線程池對象的弊端如下: 1) FixedThreadPool 和 SingleThreadPool:
允許的請求隊列長度爲 Integer.MAX_VALUE,可能會堆積大量的請求,從而導致 OOM。 2) CachedThreadPool:
允許的創建線程數量爲 Integer.MAX_VALUE,可能會創建大量的線程,從而導致 OOM。
一開始解決的問題是由於同事在創建線程池時使用了局部變量,每個請求進入這個方法時,都會創建一個線程池。大量併發請求進入後,新建了大量的線程,導致系統虛擬內存被耗盡,JVM拋出 java.lang.OutOfMemoryError: Unable to create new native thread 錯誤。
解決了這個問題後,重新部署後,JVM仍然會出現crash的現象。通過MAT分析hs_err[pid].log文件,發現仍然是由於oom導致的。只是異常信息變成了
java.lang.OutOfMemoryError: Java heap space
通過日誌文件分析,發現是由於使用了無界隊列,而第三方接口響應時長很大,導致核心線程數消化不了這些請求,任務被不停加進隊列中,而隊列又是無界的,導致了OOM。
另外,看日誌文件,可以看到多次的OOM異常信息,直到某一個時刻,JVM才崩潰退出。說明當某一線程OOM的時候,會把該線程佔用的內存釋放,而不影響調用它的線程!然後會再次有任務加入,再次OOM,釋放資源,直到JVM崩潰推出……
解決方案即設定隊列會有界隊列,同時選擇合適的拒絕策略。
注意:
OOM並一定會導致jvm crash。
jvm crash和jvm heap dump掉其實不是一個概念,jvmcrash是某些代碼,在某種特殊條件下觸發了jvm底層的bug,導致jvm進程直接kill掉了;jvmheapdump 是由於jvm發生了OutOfMemoryError(就是我們的黑話oom錯誤),從而導致jvm自動退出。可以這麼理解,前者是底層的異常導致進程退出,後者是應用代碼的異常導致了jvm出於保護機制導致了進程退出。
線程池內異常捕獲
提交方式
向線程池提交任務,通常有兩種方式
- execute,提交無返回值的任務
- submit,提交有返回值的任務
事實上,submit最終也是調用的execute方法的,在調用execute前,submit方法會將我們的Runnable包裝爲一個RunnableFuture對象,這個對象實際上是FutureTask實例,然後將這個FutureTask交給execute方法執行。
/**
* @throws RejectedExecutionException {@inheritDoc}
* @throws NullPointerException {@inheritDoc}
*/
public Future<?> submit(Runnable task) {
if (task == null) throw new NullPointerException();
RunnableFuture<Void> ftask = newTaskFor(task, null);
execute(ftask);
return ftask;
}
在FutureTask的構造方法中,Runnable被包裝成了一個Callable類型的對象。
/**
* Creates a {@code FutureTask} that will, upon running, execute the
* given {@code Runnable}, and arrange that {@code get} will return the
* given result on successful completion.
*
* @param runnable the runnable task
* @param result the result to return on successful completion. If
* you don't need a particular result, consider using
* constructions of the form:
* {@code Future<?> f = new FutureTask<Void>(runnable, null)}
* @throws NullPointerException if the runnable is null
*/
public FutureTask(Runnable runnable, V result) {
this.callable = Executors.callable(runnable, result);
this.state = NEW; // ensure visibility of callable
}
submit的異常捕獲方式
在FutureTask的run方法中,調用了Callable對象的call方法,即調用了Runnable對象的run方法。同時,如果代碼(Runnable)拋出異常,異常將被捕獲並保存下來。
public void run() {
if (state != NEW ||
!UNSAFE.compareAndSwapObject(this, runnerOffset,
null, Thread.currentThread()))
return;
try {
Callable<V> c = callable;
if (c != null && state == NEW) {
V result;
boolean ran;
try {
result = c.call();
ran = true;
} catch (Throwable ex) {
result = null;
ran = false;
setException(ex);
}
if (ran)
set(result);
}
} finally {
// runner must be non-null until state is settled to
// prevent concurrent calls to run()
runner = null;
// state must be re-read after nulling runner to prevent
// leaked interrupts
int s = state;
if (s >= INTERRUPTING)
handlePossibleCancellationInterrupt(s);
}
}
protected void setException(Throwable t) {
if (UNSAFE.compareAndSwapInt(this, stateOffset, NEW, COMPLETING)) {
outcome = t;
UNSAFE.putOrderedInt(this, stateOffset, EXCEPTIONAL); // final state
finishCompletion();
}
}
在調用Future對象的get方法時,會將保存的異常重新拋出,然後針對異常做一些處理。
public V get(long timeout, TimeUnit unit)
throws InterruptedException, ExecutionException, TimeoutException {
if (unit == null)
throw new NullPointerException();
int s = state;
if (s <= COMPLETING &&
(s = awaitDone(true, unit.toNanos(timeout))) <= COMPLETING)
throw new TimeoutException();
return report(s);
}
/**
* Returns result or throws exception for completed task.
* @param s completed state value
*/
@SuppressWarnings("unchecked")
private V report(int s) throws ExecutionException {
Object x = outcome;
if (s == NORMAL)
return (V)x;
if (s >= CANCELLED)
throw new CancellationException();
throw new ExecutionException((Throwable)x);
}
execute的異常捕獲方式
如果我們不關心這個任務的結果,可以直接使用ExecutorService中的execute方法(實際是繼承Executor接口)來直接去執行任務。這樣異常就不會被保存下來,不用get方法就可以捕獲到異常。
在使用execute提交任務的時,任務最終會被一個Worker對象執行。這個Worker內部封裝了一個Thread對象,這個Thread就是線程池的工作者線程。工作者線程會調用runWorker方法來執行我們提交的任務:
/**
* Main worker run loop. Repeatedly gets tasks from queue and
* executes them, while coping with a number of issues:
*
* 1. We may start out with an initial task, in which case we
* don't need to get the first one. Otherwise, as long as pool is
* running, we get tasks from getTask. If it returns null then the
* worker exits due to changed pool state or configuration
* parameters. Other exits result from exception throws in
* external code, in which case completedAbruptly holds, which
* usually leads processWorkerExit to replace this thread.
*
* 2. Before running any task, the lock is acquired to prevent
* other pool interrupts while the task is executing, and then we
* ensure that unless pool is stopping, this thread does not have
* its interrupt set.
*
* 3. Each task run is preceded by a call to beforeExecute, which
* might throw an exception, in which case we cause thread to die
* (breaking loop with completedAbruptly true) without processing
* the task.
*
* 4. Assuming beforeExecute completes normally, we run the task,
* gathering any of its thrown exceptions to send to afterExecute.
* We separately handle RuntimeException, Error (both of which the
* specs guarantee that we trap) and arbitrary Throwables.
* Because we cannot rethrow Throwables within Runnable.run, we
* wrap them within Errors on the way out (to the thread's
* UncaughtExceptionHandler). Any thrown exception also
* conservatively causes thread to die.
*
* 5. After task.run completes, we call afterExecute, which may
* also throw an exception, which will also cause thread to
* die. According to JLS Sec 14.20, this exception is the one that
* will be in effect even if task.run throws.
*
* The net effect of the exception mechanics is that afterExecute
* and the thread's UncaughtExceptionHandler have as accurate
* information as we can provide about any problems encountered by
* user code.
*
* @param w the worker
*/
final void runWorker(Worker w) {
Thread wt = Thread.currentThread();
Runnable task = w.firstTask;
w.firstTask = null;
w.unlock(); // allow interrupts
boolean completedAbruptly = true;
try {
while (task != null || (task = getTask()) != null) {
w.lock();
// If pool is stopping, ensure thread is interrupted;
// if not, ensure thread is not interrupted. This
// requires a recheck in second case to deal with
// shutdownNow race while clearing interrupt
if ((runStateAtLeast(ctl.get(), STOP) ||
(Thread.interrupted() &&
runStateAtLeast(ctl.get(), STOP))) &&
!wt.isInterrupted())
wt.interrupt();
try {
//可重寫此方法或terminated方法輸出日誌等
beforeExecute(wt, task);
Throwable thrown = null;
try {
//從任務隊列中取出任務執行
task.run();
} catch (RuntimeException x) {
thrown = x; throw x;
} catch (Error x) {
thrown = x; throw x;
} catch (Throwable x) {
thrown = x; throw new Error(x);
} finally {
afterExecute(task, thrown);
}
} finally {
task = null;
w.completedTasks++;
w.unlock();
}
}
completedAbruptly = false;
} finally {
processWorkerExit(w, completedAbruptly);
}
}
如果任務代碼(task.run())拋出異常,會被最內層的try–catch塊捕獲,然後重新拋出。注意到最裏面的finally塊,在重新拋出異常之前,會先執行afterExecute方法,這個方法的默認實現爲空,即什麼也不做。
在此可以重寫ThreadPoolExecutor.afterExecute方法,處理傳遞到afterExecute方法中的異常。
如果未重寫afterExecute方法,即異常未捕獲,則會調用Thread#dispatchUncaughtException方法
/**
* Dispatch an uncaught exception to the handler. This method is
* intended to be called only by the JVM.
*/
private void dispatchUncaughtException(Throwable e) {
getUncaughtExceptionHandler().uncaughtException(this, e);
}
public void uncaughtException(Thread t, Throwable e) {
if (parent != null) {
parent.uncaughtException(t, e);
} else {
Thread.UncaughtExceptionHandler ueh =
Thread.getDefaultUncaughtExceptionHandler();
if (ueh != null) {
ueh.uncaughtException(t, e);
} else if (!(e instanceof ThreadDeath)) {
System.err.print("Exception in thread \""
+ t.getName() + "\" ");
e.printStackTrace(System.err);
}
}
}
此處可以在ThreadPoolExecutor線程工廠ThreadFactory中提供自定義的UncaughtExceptionHandler,在uncaughtException方法中處理異常。
除以上三種方法外,還可以在業務代碼中直接try/catch,捕獲任務代碼可能拋出的所有異常,包括未檢測異常。
線程池的關閉
- shutdown()
shutdown並不直接關閉線程池,而是不再接受新的任務。如果線程池內有任務,那麼待這些任務執行完畢後再關閉線程池。 - shutdownNow()
shutdownNow表示不再接受新的任務,並把任務隊列中的任務直接移出掉,如果有正在執行的,嘗試進行停止。
題外話
一個線程對應一個 JVM Stack。JVM Stack 中包含一組 Stack Frame。線程每調用一個方法就對應着 JVM Stack 中 Stack Frame 的入棧,方法執行完畢或者異常終止對應着出棧(銷燬)。
當 JVM 調用一個 Java 方法時,它從對應類的類型信息中得到此方法的局部變量區和操作數棧的大小,並據此分配棧幀內存,然後壓入 JVM 棧中。
在活動線程中,只有位於棧頂的棧幀纔是有效的,稱爲當前棧幀,與這個棧幀相關聯的方法稱爲當前方法。
主要關注的stack棧內存,就是虛擬機棧中局部變量表部分。局部變量表(Local Variable Table)是一組變量值存儲空間,用於存放方法參數和方法內部定義的局部變量。並且在Java編譯爲Class文件時,就已經確定了該方法所需要分配的局部變量表的最大容量。
局部變量表內容越多,棧幀越大,棧深度越小。
通過-Xss可以設置棧的大小。如果線程需要一個比固定大小大的Stack,會發生StackOverflowError;如果系統沒有足夠的內存爲新線程創建Stack,發生OutOfMemoryError。