在這篇博客中,我們來看一下AMS處理App crash時涉及到的主要流程。
一、設置異常處理器
在Android平臺中,應用進程fork出來後會爲虛擬機設置一個未截獲異常處理器,
即在程序運行時,如果有任何一個線程拋出了未被截獲的異常,
那麼該異常最終會拋給未截獲異常處理器處理。
我們首先看看Android N中設置異常處理器的這部分代碼。
在ZygoteInit.java的runSelectLoop中:
private static void runSelectLoop(String abiList) throws MethodAndArgsCaller {
...........
while (true) {
..........
for (int i = pollFds.length - 1; i >= 0; --i) {
..........
if (i == 0) {
//zygote中的server socket收到消息後,建立起ZygoteConnection
ZygoteConnection newPeer = acceptCommandPeer(abiList);
peers.add(newPeer);
fds.add(newPeer.getFileDesciptor());
} else {
//ZygoteConnection建立後,收到消息調用自己的runOnce函數
boolean done = peers.get(i).runOnce();
.............
}
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
我們知道zygote啓動後,會在自己的進程中定義一個server socket,專門接收創建進程的消息。
如上面的代碼所示,收到創建進程的消息後,zygote會創建出ZygoteConnection,並調用其runOnce函數:
boolean runOnce() throws ZygoteInit.MethodAndArgsCaller {
...............
try {
...........
//fork出子進程
pid = Zygote.forkAndSpecialize(.......);
} catch (ErrnoException ex) {
............
} catch (IllegalArgumentException ex) {
...........
} catch (ZygoteSecurityException ex) {
..........
}
try {
if (pid == 0) {
........
//進程fork成功後,進行處理
handleChildProc(parsedArgs, descriptors, childPipeFd, newStderr);
........
} else {
...........
}
} finally {
..........
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
我們跟進一下handleChildProc函數:
private void handleChildProc(......) {
............
if (parsedArgs.invokeWith != null) {
..........
} else {
//進入到RuntimeInit中的zygoteInit函數
RuntimeInit.zygoteInit(parsedArgs.targetSdkVersion,
parsedArgs.remainingArgs, null /* classLoader */);
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
順着流程看一看RuntimeInit中的zygoteInit函數:
public static final void zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader)
throws ZygoteInit.MethodAndArgsCaller {
............
//跟進commonInit
commonInit();
............
}
private static final void commonInit() {
...........
/* set default handler; this applies to all threads in the VM */
//到達目的地!
Thread.setDefaultUncaughtExceptionHandler(new UncaughtHandler());
...........
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
從上面的代碼可以看出,fork出進程後,將在進程commonInit的階段設置異常處理器UncaughtHandler。
二、異常處理的流程
1、UncaughtHandler的異常處理
接下來我們看看UncaughtHandler如何處理未被捕獲的異常。
private static class UncaughtHandler implements Thread.UncaughtExceptionHandler {
public void uncaughtException(Thread t, Throwable e) {
try {
// Don't re-enter -- avoid infinite loops if crash-reporting crashes.
if (mCrashing) return;
mCrashing = true;
if (mApplicationObject == null) {
Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
} else {
//打印進程的crash信息
.............
}
.............
// Bring up crash dialog, wait for it to be dismissed
//調用AMS的接口,進行處理
ActivityManagerNative.getDefault().handleApplicationCrash(
mApplicationObject, new ApplicationErrorReport.CrashInfo(e));
} catch (Throwable t2) {
if (t2 instanceof DeadObjectException) {
// System process is dead; ignore
} else {
try {
Clog_e(TAG, "Error reporting crash", t2);
} catch (Throwable t3) {
// Even Clog_e() fails! Oh well.
}
}
} finally {
// Try everything to make sure this process goes away.
//crash的最後,會殺死進程
Process.killProcess(Process.myPid());
//並exit
System.exit(10);
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
從代碼來看,UncaughtHandler對異常的處理流程比較清晰,基本上就是:
1、記錄log信息;
2、調用AMS的接口進行一些處理;
3、殺死出現crash的進程。
其中比較重要的應該是AMS處理crash的流程,接下來我們跟進一下這部分流程的代碼。
2、AMS的異常處理
public void handleApplicationCrash(IBinder app, ApplicationErrorReport.CrashInfo crashInfo) {
//得到crash app對應的信息
ProcessRecord r = findAppProcess(app, "Crash");
final String processName = app == null ? "system_server"
: (r == null ? "unknown" : r.processName);
//調用handleApplicationCrashInner進一步處理
handleApplicationCrashInner("crash", r, processName, crashInfo);
}
void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
ApplicationErrorReport.CrashInfo crashInfo) {
...............
//Write a description of an error (crash, WTF, ANR) to the drop box.
//記錄信息到drop box
addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
//調用內部類AppErrors的crashApplication函數
mAppErrors.crashApplication(r, crashInfo);
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
我們跟進一下AppErrors類中的crashApplication函數:
/**
* Bring up the "unexpected error" dialog box for a crashing app.
* Deal with edge cases (intercepts from instrumented applications,
* ActivityController, error intent receivers, that sort of thing).
* /
void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
final long origId = Binder.clearCallingIdentity();
try {
//實際的處理函數爲crashApplicationInner
crashApplicationInner(r, crashInfo);
} finally {
Binder.restoreCallingIdentity(origId);
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
此處實際的處理函數爲crashApplicationInner。
void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
long timeMillis = System.currentTimeMillis();
//從應用進程傳遞過來的crashInfo中獲取相關的信息
String shortMsg = crashInfo.exceptionClassName;
String longMsg = crashInfo.exceptionMessage;
String stackTrace = crashInfo.stackTrace;
................
AppErrorResult result = new AppErrorResult();
TaskRecord task;
synchronized (mService) {
/**
* If crash is handled by instance of {@link android.app.IActivityController},
* finish now and don't show the app error dialog.
*/
//通知觀察者處理crash
//如果存在觀察者且能夠處理crash,那麼不顯示error dialog
//例如在進行Monkey Test,那麼可設置檢測到crash後,就停止測試等
if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
timeMillis)) {
return;
}
/**
* If this process was running instrumentation, finish now - it will be handled in
* {@link ActivityManagerService#handleAppDiedLocked}.
*/
if (r != null && r.instrumentationClass != null) {
return;
}
// Log crash in battery stats.
if (r != null) {
mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);
}
AppErrorDialog.Data data = new AppErrorDialog.Data();
data.result = result;
data.proc = r;
// If we can't identify the process or it's already exceeded its crash quota,
// quit right away without showing a crash dialog.
// 調用makeAppCrashingLocked進行處理,如果返回false,則無需進行後續處理
if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
return;
}
//發送SHOW_ERROR_UI_MSG給mUiHandler,將彈出一個對話框,提示用戶某進程crash
//用戶可以選擇"退出"或"退出並報告"等
//一般的廠商應該都定製了這個界面
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;
task = data.task;
msg.obj = data;
mService.mUiHandler.sendMessage(msg);
}
//調用AppErrorResult的get函數,該函數是阻塞的,直到用戶處理了對話框爲止
//注意此處涉及了兩個線程的工作
//crashApplicationInner函數工作在Binder調用所在的線程
//對話框工作於AMS的Ui線程
int res = result.get();
Intent appErrorIntent = null;
//以下開始根據對話框中用戶的選擇,進行對應的處理
...................
//長時間未點擊對話框或者點擊取消,那麼相當於選擇強行停止crash進程
if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
res = AppErrorDialog.FORCE_QUIT;
}
//根據res的值進行相應的處理
synchronized (mService) {
//選擇不再提示錯誤
if (res == AppErrorDialog.MUTE) {
//將進程名加入到AMS的mAppsNotReportingCrashes表中
stopReportingCrashesLocked(r);
}
//選擇了重新啓動
if (res == AppErrorDialog.RESTART) {
mService.removeProcessLocked(r, false, true, "crash");
if (task != null) {
try {
//嘗試重啓進程
mService.startActivityFromRecents(task.taskId,
ActivityOptions.makeBasic().toBundle());
} catch (IllegalArgumentException e) {
// Hmm, that didn't work, app might have crashed before creating a
// recents entry. Let's see if we have a safe-to-restart intent.
if (task.intent.getCategories().contains(
Intent.CATEGORY_LAUNCHER)) {
//換一種方式重啓
mService.startActivityInPackage(............);
}
}
}
}
//選擇強行停止
if (res == AppErrorDialog.FORCE_QUIT) {
long orig = Binder.clearCallingIdentity();
try {
// Kill it with fire!
//handleAppCrashLocked主要是結束activity,並更新oom_adj
mService.mStackSupervisor.handleAppCrashLocked(r);
if (!r.persistent) {
//如果不是常駐應用,則在此處kill掉
mService.removeProcessLocked(r, false, false, "crash");
mService.mStackSupervisor.resumeFocusedStackTopActivityLocked();
}
} finally {
Binder.restoreCallingIdentity(orig);
}
}
//選擇強制停止並報告
if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
//該函數中將生成錯誤信息,並構造一個Intent用於拉起報告界面
appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
}
if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {
// XXX Can't keep track of crash time for isolated processes,
// since they don't have a persistent identity.
//記錄crash時間
mProcessCrashTimes.put(r.info.processName, r.uid,
SystemClock.uptimeMillis());
}
}
if (appErrorIntent != null) {
try {
//如果選擇了強制停止並報告,那麼此時就會拉起報告界面
mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
} catch (ActivityNotFoundException e) {
..............
}
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
整體來看AMS處理crash的流程還是相當清晰的:
1、首先記錄crash相關的信息到drop box;
2、如果存在可以處理App crash的ActivityController,那麼將crash交給它處理;
否則,彈出crash對話框,然用戶選擇後續操作。
3、根據用戶的選擇,AMS可以進行重啓應用、強行停止應用或拉起報告界面等操作。
不過上述流程中,在拉起對話框前,先調用了makeAppCrashingLocked函數。
若這個函數返回false,那麼後續的流程就不會繼續進行。
我們來看看這個函數的具體用途。
private boolean makeAppCrashingLocked(ProcessRecord app,
String shortMsg, String longMsg, String stackTrace, AppErrorDialog.Data data) {
app.crashing = true;
//就是創建一個對象,其中包含了所有的錯誤信息
app.crashingReport = generateProcessError(app,
ActivityManager.ProcessErrorStateInfo.CRASHED, null, shortMsg, longMsg, stackTrace);
//前面的代碼已經提到過,系統可以通過Intent拉起一個crash報告界面
//startAppProblemLocked函數,就是在系統中找到這個報告界面對應的ComponentName
//此外,如果crash應用正好在處理有序廣播,那麼爲了不影響後續廣播接受器的處理,
//startAppProblemLocked會停止crash應用對廣播的處理流程,
//即後續的廣播接受器可以跳過crash應用,直接開始處理有序廣播
startAppProblemLocked(app);
//停止“凍結”屏幕
app.stopFreezingAllLocked();
//進行一些後續的處理
//從代碼來看,如果應用不是在1min內連續crash,該函數都會返回true
return handleAppCrashLocked(app, "force-crash" /*reason*/, shortMsg, longMsg, stackTrace,
data);
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
根據上面的代碼,可以看出makeAppCrashingLocked函數最主要的工作主要有兩個:
1、查找crash報告界面對應的componentName;
2、避免進程短時間內連續crash,導致頻繁拉起對話框。
三、後續的清理工作
根據前面的流程,我們知道當進程crash後,最終將被kill掉,
此時AMS還需要完成後續的清理工作。
我們先來回憶一下進程啓動後,註冊到AMS的部分流程:
//進程啓動後,對應的ActivityThread會attach到AMS上
private final boolean attachApplicationLocked(IApplicationThread thread,
int pid) {
............
ProcessRecord app;
if (pid != MY_PID && pid >= 0) {
synchronized (mPidsSelfLocked) {
app = mPidsSelfLocked.get(pid);
}
} else {
app = null;
}
............
final String processName = app.processName;
try {
//生成了一個“訃告”接收者
AppDeathRecipient adr = new AppDeathRecipient(
app, pid, thread);
thread.asBinder().linkToDeath(adr, 0);
app.deathRecipient = adr;
} catch (RemoteException e) {
app.resetPackageList(mProcessStats);
startProcessLocked(app, "link fail", processName);
return false;
}
................
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
從上面的代碼可以看出,當進程註冊到AMS時,AMS註冊了一個“訃告”接收者註冊到進程中。
因此,當crash進程被kill後,AppDeathRecipient中的binderDied函數將被回調:
@Override
public void binderDied() {
..........
synchronized(ActivityManagerService.this) {
appDiedLocked(mApp, mPid, mAppThread, true);
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
根據代碼可知,接收到進程“死亡”的通知後,最後還是調用AMS的appDiedLocked函數進行處理:
final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
boolean fromBinderDied) {
// First check if this ProcessRecord is actually active for the pid.
synchronized (mPidsSelfLocked) {
ProcessRecord curProc = mPidsSelfLocked.get(pid);
if (curProc != app) {
...........
return;
}
}
.............
if (!app.killed) {
if (!fromBinderDied) {
Process.killProcessQuiet(pid);
}
killProcessGroup(app.uid, pid);
app.killed = true;
}
//以上都是一些保證健壯性的代碼
if (app.pid == pid && app.thread != null &&
app.thread.asBinder() == thread.asBinder()) {
//進程是正常啓動的,非測試啓動,那麼需要內存調整
boolean doLowMem = app.instrumentationClass == null;
boolean doOomAdj = doLowMem;
if (!app.killedByAm) {
............
mAllowLowerMemLevel = true;
} else {
mAllowLowerMemLevel = false;
doLowMem = false;
}
..............
//handleAppDiedLocked進行實際的工作
handleAppDiedLocked(app, false, true);
if (doOomAdj) {
//重新更新進程的oom_adj
updateOomAdjLocked();
}
if (doLowMem) {
//在必要時,觸發系統中的進程做內存回收
doLowMemReportIfNeededLocked(app);
}
}.........
..........
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
appDiedLocked函數中比較重要的是handleAppDiedLocked函數:
private final void handleAppDiedLocked(ProcessRecord app,
boolean restarting, boolean allowRestart) {
int pid = app.pid;
//進行進程中service、ContentProvider、BroadcastReceiver等的收尾工作
//這個函數雖然很長,但實際的功能還是很清晰的,這裏不作進一步展開
//比較重要的是:1、對於crash進程中的Bounded Service而言,會清理掉service與客戶端之間的聯繫;
//此外若service的客戶端重要性過低,還會被直接kill掉
//2、清理ContentProvider時,在removeDyingProviderLocked函數中,可能清理掉其客戶端進程(對於stable contentProvider而言)
boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1);
if (!kept && !restarting) {
//不再保留和重啓時,從LRU表中移除
removeLruProcessLocked(app);
if (pid > 0) {
ProcessList.remove(pid);
}
}
..................
// Remove this application's activities from active lists.
//進行Activity相關的收尾工作
boolean hasVisibleActivities = mStackSupervisor.handleAppDiedLocked(app);
app.activities.clear();
if (app.instrumentationClass != null) {
..............
}
if (!restarting && hasVisibleActivities
&& !mStackSupervisor.resumeFocusedStackTopActivityLocked()) {
// If there was nothing to resume, and we are not already restarting this process, but
// there is a visible activity that is hosted by the process... then make sure all
// visible activities are running, taking care of restarting this process.
// 從註釋來看,若當前只有crash進程中存在可視Activity,那麼AMS還是會試圖重啓該進程
mStackSupervisor.ensureActivitiesVisibleLocked(null, 0, !PRESERVE_WINDOWS);
}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
上述代碼中cleanUpApplicationRecordLocked函數,在此不做深入分析。
其中唯一比較麻煩的就是Bounded Service和ContentProvider的清理,
因爲這兩種組件全部要考慮其客戶端進程。
四、總結
整體來講,Android中進程crash後的處理流程基本上如上圖所示。
這個流程相對來說是比較簡單的,唯一麻煩點的地方可能是進程結束後,
AMS進行的清理工作。