背景
項目中跑monkey發現anr分析log時,發現進程出現anr時,進程直接被殺掉了,因爲需要在anr時抓取內存信息,結果因爲進程重啓導致抓到的內存信息並不是出問題時的信息。因此研究了一波Android出現anr時的處理邏輯。這裏基於mtk平臺,部分源碼可能有差異。
Android ANR觸發流程
一、ANR後觸發dump等操作的代碼
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
接口爲appNotResponding()
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
Slog.d("testview", "=================================== appNotResponding " + (mService.mController == null ? "controller is null" : "controller not null"));
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
Slog.d("testview", "=================================== appNotResponding res is " + res);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
Slog.d("testview", "=================================== app.kill");
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
......
Slog.d("testview", "=================================== startAnrDump");
/// M: ANR Debug Mechanism
if (mService.mAnrManager.startAnrDump(mService, app, activity, parent, aboveSystem,
annotation, showBackground)) {
Slog.d("testview", "=================================== startAnrDump finished");
return;
}
Slog.d("testview", "=================================== next");
if (mService.mController != null) {
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
Slog.d("testview", "=================================== res < 0 && app.kill");
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
synchronized (mService) {
......
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
此塊代碼有四個重要邏輯:
1、根據mService.mController.appEarlyNotResponding返回的值,確定要不要殺掉進程
2、mService.mAnrManager.startAnrDump, dump相關信息
3、根據mService.mController.appNotResponding返回的值,確定要不要殺掉進程
4、mService.mUiHandler.sendMessage(msg)發送handler彈出ANR彈窗
二、調用AppErrors.appNotResponding()的地方
1、BroadcastQueue.java廣播超時調用
2、ContentProviderClient.java內容提供者超時調用
3、ActiveServices.java服務超時調用
4、ActivityThread.java input響應超時調用
有興趣可以執行根據路徑研究源碼
增加日誌排查進程被殺原因
一、如上代碼增加的日誌,發現跑monkey時,彈出ANR彈窗的日誌沒打印
代碼路徑
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
接口爲handleShowAnrUi(),是從AMS中調用的ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG
void handleShowAnrUi(Message msg) {
Slog.d("testview", "========================= appErrors handleShowAnrUi");
Dialog dialogToShow = null;
synchronized (mService) {
AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
final ProcessRecord proc = data.proc;
if (proc == null) {
Slog.e(TAG, "handleShowAnrUi: proc is null");
return;
}
if (proc.anrDialog != null) {
Slog.e(TAG, "App already has anr dialog: " + proc);
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.ALREADY_SHOWING);
return;
}
Intent intent = new Intent("android.intent.action.ANR");
if (!mService.mProcessesReady) {
intent.addFlags(Intent.FLAG_RECEIVER_REGISTERED_ONLY
| Intent.FLAG_RECEIVER_FOREGROUND);
}
mService.broadcastIntentLocked(null, null, intent,
null, null, 0, null, null, null, AppOpsManager.OP_NONE,
null, false, false, MY_PID, Process.SYSTEM_UID, 0 /* TODO: Verify */);
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
if (mService.canShowErrorDialogs() || showBackground) {
dialogToShow = new AppNotRespondingDialog(mService, mContext, data);
proc.anrDialog = dialogToShow;
} else {
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.CANT_SHOW);
// Just kill the app if there is no dialog to be shown.
mService.killAppAtUsersRequest(proc, null);
}
}
// If we've created a crash dialog, show it without the lock held
if (dialogToShow != null) {
dialogToShow.show();
}
}
二、可以看到唯一懷疑點就是mService.mController.appNotResponding,如果有興趣研究mController是什麼的可以從上邊的代碼路徑中去研究。
查找Monkey源碼中是否自定義了controller
一、發現monkey的源碼類Monkey.java自定義了controller
private class ActivityController extends IActivityController.Stub {
public boolean activityStarting(Intent intent, String pkg) {
boolean allow = checkEnteringPackage(pkg) || (DEBUG_ALLOW_ANY_STARTS != 0);
if (mVerbose > 0) {
// StrictMode's disk checks end up catching this on
// userdebug/eng builds due to PrintStream going to a
// FileOutputStream in the end (perhaps only when
// redirected to a file?) So we allow disk writes
// around this region for the monkey to minimize
// harmless dropbox uploads from monkeys.
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.out.println(" // " + (allow ? "Allowing" : "Rejecting") + " start of "
+ intent + " in package " + pkg);
StrictMode.setThreadPolicy(savedPolicy);
}
currentPackage = pkg;
currentIntent = intent;
return allow;
}
public boolean activityResuming(String pkg) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.out.println(" // activityResuming(" + pkg + ")");
boolean allow = checkEnteringPackage(pkg) || (DEBUG_ALLOW_ANY_RESTARTS != 0);
if (!allow) {
if (mVerbose > 0) {
System.out.println(" // " + (allow ? "Allowing" : "Rejecting")
+ " resume of package " + pkg);
}
}
currentPackage = pkg;
StrictMode.setThreadPolicy(savedPolicy);
return allow;
}
public boolean appCrashed(String processName, int pid,
String shortMsg, String longMsg,
long timeMillis, String stackTrace) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.err.println("// CRASH: " + processName + " (pid " + pid + ")");
System.err.println("// Short Msg: " + shortMsg);
System.err.println("// Long Msg: " + longMsg);
System.err.println("// Build Label: " + Build.FINGERPRINT);
System.err.println("// Build Changelist: " + Build.VERSION.INCREMENTAL);
System.err.println("// Build Time: " + Build.TIME);
System.err.println("// " + stackTrace.replace("\n", "\n// "));
StrictMode.setThreadPolicy(savedPolicy);
if (!mIgnoreCrashes || mRequestBugreport) {
synchronized (Monkey.this) {
if (!mIgnoreCrashes) {
mAbort = true;
}
if (mRequestBugreport){
mRequestAppCrashBugreport = true;
mReportProcessName = processName;
}
}
return !mKillProcessAfterError;
}
return false;
}
public int appEarlyNotResponding(String processName, int pid, String annotation) {
return 0;
}
public int appNotResponding(String processName, int pid, String processStats) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.err.println("// NOT RESPONDING: " + processName + " (pid " + pid + ")");
System.err.println(processStats);
StrictMode.setThreadPolicy(savedPolicy);
synchronized (Monkey.this) {
mRequestAnrTraces = true;
mRequestDumpsysMemInfo = true;
mRequestProcRank = true;
if (mRequestBugreport){
mRequestAnrBugreport = true;
mReportProcessName = processName;
}
}
if (!mIgnoreTimeouts) {
synchronized (Monkey.this) {
mAbort = true;
}
}
return (mKillProcessAfterError) ? -1 : 1;
}
}
controller中對幾個接口進行了定製,這裏主要看appNotResponding
二、appNotResponding會根據mKillProcessAfterError返回是否-1,AppErrors.java如果mService.mController.appNotResponding返回的值爲-1就會殺死進程,也不會彈出ANR彈窗,找到問題原因了。
三、mKillProcessAfterError如何被賦值的
private boolean processOptions() {
// quick (throwaway) check for unadorned command
if (mArgs.length < 1) {
showUsage();
return false;
}
try {
String opt;
while ((opt = nextOption()) != null) {
if (opt.equals("-s")) {
mSeed = nextOptionLong("Seed");
} else if (opt.equals("--kill-process-after-error")) {
mKillProcessAfterError = true;
......
–kill-process-after-error參數會決定要不要殺死出錯進程。後來找測試同學確定,他們的monkey測試確實加了這個參數。
結論
最終發現是測試同學誤加參數導致,–kill-process-after-error參數會決定要不要殺死出錯進程。
如果系統本身相對ANR異常處理,比如不彈出ANR彈窗,自定義實現Controller即可。