Android Handler消息隊列的實現原理

我們在寫Android程序的時候，有經常用到Handler來與子線程通信，亦或者是用其來管理程序運行的狀態時序。Handler其是由Android提供的一套完善的操作消息隊列的API。它既可以運行在主線程中，也可以運行在子線程中，唯一的區別是其內部的Looper對象不同。在這裏我將對Android中Handler消息隊列的實現進行一個總結，以便深入瞭解其原理並且鞏固其使用方式。

本系列的主要內容如下：

1.整體架構與主要的數據結構
2.Message的入列出列以及延遲處理的實現方式
3.底層原理：epoll的基礎概念以及在Handler中的應用

整體架構

實現Handler消息隊列的源碼如下：

framework/base/core/java/android/os/Handler.java
framework/base/core/java/android/os/Looper.java
framework/base/core/java/android/os/Message.java
framework/base/core/java/android/os/MessageQueue.java
framework/base/core/jni/android_os_MessageQueue.cpp
system/core/libutils/Looper.cpp
system/core/include/utils/Looper.h

Handler: 主要提供各類API用於將Message對象加入到隊列中，並在Message從隊列中被取出時，觸發handleMessage回調或者執行Message中的Runable對象的run函數。

Looper: java層的Looper類是消息隊列的工作引擎，提供一個死循環不斷從MessageQueue中取出Message, 取出的Message會在Handler的handleMessage中處理。

Message: 消息隊列中所傳遞的消息的數據類型，其父類是Parcelable。內部還包含了一個Message對象池的實現，用來複用Message對象；

MessageQueue：消息隊列，主要實現Message的入列，出列的邏輯管理，維護Message單鏈表，隊列的清空，以及與JNI部分的通信。

android_os_MessageQueue: 消息隊列的JNI部分實現，內部的主要邏輯就是初始化native的Looper對象，以及在java部分調用JNI中的方法時，執行native Looper部分對應的方法。

native Looper: native層的Looper中實現了兩個功能，一個是利用epoll實現了隊列的阻塞與喚醒功能，另外一個是實現了一套native層中使用的隊列機制。

其分層架構如下所示：Handler, Message，Looper三者圍繞MessageQueue進行處理，MessageQueue通過jni與Looper.cpp進行通信實現空隊列時阻塞，有消息入列時喚醒等功能。

整體架構如下：MessageQueue中維護着一個Message的單鏈表，Handler中enqueueMessage將消息添加到隊列中，Looper.java從隊列中取出Message並將其交由Handler的dispatchMessage分發。android_os_MessageQueue.cpp主要是jni的實現部分，完成MessageQueue跟Looper.cpp的交互。 Looper.cpp則使用epoll機制實現了隊列的阻塞與喚醒。

數據結構

Android的隊列所管理的數據類型爲Message, 消息隊列所用的數據結構則是Message類型的單鏈表。一般模式下其結構如下所示：

在Looper中，依次從Message 0 到 Message n中將數據取出丟給Handler處理。當添加新的數據時，如果Message對象的when屬性爲0，則將其添加到鏈頭。這種情況一般是我們調用Handler的sendMessageAtFrontOfQueue時出現。

在其他情況下，每一次添加新的消息時，都需要從鏈頭依次對比Message的when變量，當新消息的when變量小於鏈中元素的when變量，則將其插入鏈中。一般調用sendEmptyMessageAtTime，sendEmptyMessageDelayed sendMessageDelayed，sendMessageAtTime等函數時，會出現延遲處理的消息。如果隊列中存在延遲消息，那麼使用sendMessage等及時處理的消息時，會出現其插入在鏈表中的情況。

以上總結了隊列的數據結構是一個單鏈表，因此我們上述所有疑問，都可以轉換成如何操作單鏈表的問題。正常情況下的單鏈表如下所示：

那Message究竟是怎麼加入到隊列中，又是怎麼從隊列中取出最終到handlerMessage中爲我們所得，爲什麼隊列可以將數據優先處理，爲什麼有的數據可以延遲處理，其具體實現隊列管理的算法是怎麼實現的。

首先看一下，Handler是如何從隊列中取值的，具體實現在MessageQueue中的next()函數

//獲取當前時間戳
final long now = SystemClock.uptimeMillis();
Message prevMsg = null;
//mMessages是一個全局的Message類型變量，保存着鏈頭的Message, 如果隊列沒有數據，則該變量指向空
Message msg = mMessages; 
if (msg != null && msg.target == null) {
        //無主的Message，不處理，直接尋找下一個節點的Message.
        // Stalled by a barrier.  Find the next asynchronous message in the queue.
        do {
             prevMsg = msg;
             msg = msg.next;
        } while (msg != null && !msg.isAsynchronous());
 }
 if (msg != null) {
        //隊列中有數據需要處理
        if (now < msg.when) {
                  //當前時間還沒達到隊列中第一個Message的消息處理時間，需要繼續等待。
                  // Next message is not ready.  Set a timeout to wake up when it is ready.
                  nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
        } else {
                  // Got a message.
                  mBlocked = false;
                  //返回當前鏈頭的Message,並將mMessages對象指向其內部名爲next的Message類型的對象。
                  //這裏就相當於取出了隊列的頭部數據。
                  if (prevMsg != null) {
                       prevMsg.next = msg.next;
                  } else {
                        mMessages = msg.next;
                  }
                  msg.next = null;
                  if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                  //標識當前Message正在使用中
                  msg.markInUse();
                  //返回Message對象。
                  return msg;
        }
} else {
        // No more messages.
        nextPollTimeoutMillis = -1;
}

這部分僅僅列出取值的具體算法，其他包括空隊列的IdleHandler的處理，以及阻塞的邏輯的部分這裏我並沒有貼出來。

Looper調用MessageQueue對象的next()方法獲取到從隊列中返回的Message對象後，再將其交由Handler去處理，也就是走到我們經常用到的handleMessage(Message msg)回調中，這部分的邏輯如下：

Message msg = queue.next(); // might block
if (msg == null) {
       // No message indicates that the message queue is quitting.
       return;
 }
 
......
......

try {
     msg.target.dispatchMessage(msg);
} finally {
     if (traceTag != 0) {
          Trace.traceEnd(traceTag);
     }
}

每一個Message對象都有一個target變量，這個target變量類型就是Handler, 每當Message取出時，都提交到其自身歸屬的Handler去處理。

每次取消息的時序圖如上所示，MessageQueue由Looper初始化，初始化之後，調用其loop方法開始讓隊列開始工作。loop方法中，開始循環調用MessageQueue的next()函數去拿Message數據，next()具有阻塞特性，當隊列沒有消息時，nativePollOnce()函數處會阻塞，原理是底層使用epoll實現了消息的阻塞/喚醒機制。當隊列中有新加入的數據時，epoll_wait就會退出，從而MessageQueue的next()方法將Message返回給Looper,並提交到Handler中消化。

知道了Handler的消息取出的流程，接下來看一下將消息加入隊列的邏輯。Handler一共爲我們提供瞭如下公開的API用來將消息添加到隊列中

//發送一個空消息
 public final boolean sendEmptyMessage(int what)
 //發送一個延遲處理的空消息
 public final boolean sendEmptyMessageDelayed(int what, long delayMillis)
 //發送一個指定時間戳執行的空消息
 public final boolean sendEmptyMessageAtTime(int what, long uptimeMillis)
 //發送一個延遲消息
 public final boolean sendMessageDelayed(Message msg, long delayMillis)
 //發送一個定時消息
 public boolean sendMessageAtTime(Message msg, long uptimeMillis)
 //將消息發送至隊列的頭
 public final boolean sendMessageAtFrontOfQueue(Message msg)
 //提交一個任務
 public final boolean post(Runnable r)
 //提交一個定時任務
 public final boolean postAtTime(Runnable r, long uptimeMillis)
 //提交一個帶身份認證的定時任務
 public final boolean postAtTime(Runnable r, Object token, long uptimeMillis)
 //提交一個延遲任務
 public final boolean postDelayed(Runnable r, long delayMillis)
 //提交一個任務到隊列頭
 public final boolean postAtFrontOfQueue(Runnable r)

以上方法，最終均會調用Handler的私有方法將Message提交到隊列中，uptimeMillis就是該msg最終執行的時間戳，也用來進行鏈中元素的排序。

private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis) {
        msg.target = this;
        if (mAsynchronous) {
            msg.setAsynchronous(true);
        }
        return queue.enqueueMessage(msg, uptimeMillis);
 }

接下來看一下消息插入的具體實現，MessageQueue中的enqueueMessage方法。

boolean enqueueMessage(Message msg, long when) {
      ...
      ...
            //標記當前Message正在被使用
            msg.markInUse();
            //將Message的when變量賦值爲執行時間戳
            msg.when = when;
            //mMessages爲鏈頭，如果此時隊列無數據，則mMessages爲空
            Message p = mMessages;
            boolean needWake;
            if (p == null || when == 0 || when < p.when) {
                //這裏判斷有三種情況，一種是鏈頭爲空表示當前隊列無數據，一種是當前提交的Message執行時間戳爲0，表示使用者
                //希望將其加入到隊列頭，還有一種情況是當前提交的Message執行時間戳小於鏈頭Message的執行時間戳，因此也需    
                //要將其加入到隊列頭                
                // New head, wake up the event queue if blocked.
                //加入隊列頭只需要將自身的next變量引用到當前的鏈頭對象，然後再將代表當前鏈頭的mMessages變量引用到msg
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // Inserted within the middle of the queue.  Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
				//開始遍歷鏈表中的每個元素，找到新Message的插入位置。Message的插入位置
				//是根據其內部的when變量來判斷，當鏈表中的元素when屬性值大於新元素的時間戳值，
				//則將元素加入到該元素的前一個節點。
                Message prev;
                for (;;) {
                    prev = p;
                    p = p.next;
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr != 0 because mQuitting is false.
            //隊列爲空時，此時epoll_wait處於等待狀態，我們需要將其喚醒，然後loop可以繼續從隊列中取出數據，分發數據。
            if (needWake) {
                nativeWake(mPtr);
            }
        }
        return true;
    }

運行時序如下所示，該圖描述了一個Message數據加入空隊列，到數據被取出消化的流程。

總結：Android消息隊列中的Message入列和出列，都是基於單鏈表來實現，其隊列排序的核心變量就是Message內部的when變量，when變量是一個時間戳，由Handler給該變量賦值，延遲消息，定時消息，都是根據when變量來實現。在這裏分析隊列的邏輯時，發現了跟jni部分的通信，主要是nativePollOnce，nativeWake方法，這兩個方法實際是實現了空隊列阻塞，以及喚醒的功能，底層使用epoll機制實現。

上述分析中講了Handler的消息入列與出列的具體實現時，其中有碰到MessageQueue中使用了兩個jni的函數，nativePollOnce和nativeWake，nativePollOnce的作用一個是保證了Looper循環在消息隊列中沒有數據或者鏈頭的Message的when變量大於當前時間戳時能夠阻塞，從而減少CPU的資源使用率，nativeWake的作用則是在Looper循環被阻塞的時候，當有新的消息加入到隊列中執行時，能夠及時喚醒阻塞的循環，保證消息能夠及時處理。那這兩個函數是如何實現阻塞/喚醒的呢？首先跟蹤兩者的本地函數定義在MessageQueue中：

 private native void nativePollOnce(long ptr, int timeoutMillis); /*non-static for callbacks*/
 private native static void nativeWake(long ptr);

對應的JNI函數在android_os_MessageQueue.cpp中，如下：

static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
        jlong ptr, jint timeoutMillis) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}

static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->wake();
}

jni函數中並沒有做處理，只是調用NativeMessageQueue的方法，繼續跟蹤兩個函數在NativeMessageQueue的執行如下：

void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
    mPollEnv = env;
    mPollObj = pollObj;
    mLooper->pollOnce(timeoutMillis);
    mPollObj = NULL;
    mPollEnv = NULL;

    if (mExceptionObj) {
        env->Throw(mExceptionObj);
        env->DeleteLocalRef(mExceptionObj);
        mExceptionObj = NULL;
    }
}

void NativeMessageQueue::wake() {
    mLooper->wake();
}

發現最終的實現都是在Looper.cpp中，在分析Looper.cpp的源碼時，有必要了解一下epoll的基礎概念，以及使用方式，參考這篇文章。
native Looper對象的初始化，在android_os_Message.cpp文件的NativeMessageQueue構造函數中完成

NativeMessageQueue::NativeMessageQueue() :
        mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    //從當前線程的私有域中查看是否有已經創建的Looper
    mLooper = Looper::getForThread();
    if (mLooper == NULL) {
       //如果沒有已經存在的Looper,則創建新的
        mLooper = new Looper(false);
        Looper::setForThread(mLooper);
    }
}

繼續跟蹤Looper的構造函數

Looper::Looper(bool allowNonCallbacks) :
        mAllowNonCallbacks(allowNonCallbacks), mSendingMessage(false),
        mPolling(false), mEpollFd(-1), mEpollRebuildRequired(false),
        mNextRequestSeq(0), mResponseIndex(0), mNextMessageUptime(LLONG_MAX) {
    //eventfd 是 Linux 的一個系統調用，創建一個文件描述符用於事件通知
     //具體參數介紹以及使用方式，請看[這裏](http://man7.org/linux/man-pages/man2/eventfd2.2.html)
    mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
    LOG_ALWAYS_FATAL_IF(mWakeEventFd < 0, "Could not make wake event fd: %s", strerror(errno));
    AutoMutex _l(mLock);
    rebuildEpollLocked();
}

void Looper::rebuildEpollLocked() {
    // Close old epoll instance if we have one.
    if (mEpollFd >= 0) {
#if DEBUG_CALLBACKS
        ALOGD("%p ~ rebuildEpollLocked - rebuilding epoll set", this);
#endif
        close(mEpollFd);
    }

    // Allocate the new epoll instance and register the wake pipe.
    //mEpollFd是epoll創建的文件描述符
    mEpollFd = epoll_create(EPOLL_SIZE_HINT);
    LOG_ALWAYS_FATAL_IF(mEpollFd < 0, "Could not create epoll instance: %s", strerror(errno));

    struct epoll_event eventItem;
    memset(& eventItem, 0, sizeof(epoll_event)); // zero out unused members of data field union
    //標記mWakeEventFd對read操作有效
    eventItem.events = EPOLLIN;
    eventItem.data.fd = mWakeEventFd;
    //epoll_ctl執行EPOLL_CTL_ADD參數的操作的意思是將mWakeEventFd加入到監聽鏈表中，當有read操作時，喚醒mWakeEventFd的wait等待。
    int result = epoll_ctl(mEpollFd, EPOLL_CTL_ADD, mWakeEventFd, & eventItem);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not add wake event fd to epoll instance: %s",
                        strerror(errno));
    ......
    ......

Looper的構造函數中，通過epoll_create獲得了一個epoll的文件描述符，再通過epoll_ctl將mWakeEventFd添加到epoll中監聽。從上面個跟蹤的流程得知，我們調用nativePollOnce()函數，最終執行的地方是在Looper的pollOnce中，代碼如下：

int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    int result = 0;
    for (;;) {
        ......
        ......
        if (result != 0) {
#if DEBUG_POLL_AND_WAKE
            ALOGD("%p ~ pollOnce - returning result %d", this, result);
#endif
            if (outFd != NULL) *outFd = 0;
            if (outEvents != NULL) *outEvents = 0;
            if (outData != NULL) *outData = NULL;
            return result;
        }
        result = pollInner(timeoutMillis);
    }
}

int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
#endif

    // Adjust the timeout based on when the next message is due.
    if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
        if (messageTimeoutMillis >= 0
                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
            timeoutMillis = messageTimeoutMillis;
        }
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
                this, mNextMessageUptime - now, timeoutMillis);
#endif
    }

    // Poll.
    int result = POLL_WAKE;
    mResponses.clear();
    mResponseIndex = 0;

    // We are about to idle.
    mPolling = true;

    //開始等待事件的觸發， timeoutMillis是在該時間內，如果沒有獲取到事件，則自動返回，爲-1則一直等待到有事件過來，爲0則不管有沒有事件，都    
    //立即返回。 Handler在設計的時候，首先會查詢一次隊列，如果沒有數據，則立即返回，然後重新pollOnce走到這裏等待新的數據過來， 往    
    // mWakeEventFd寫數據，纔會喚醒返回。
    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);

    // No longer idling.
    mPolling = false;

    // Acquire lock.
    mLock.lock();

    // Rebuild epoll set if needed.
    if (mEpollRebuildRequired) {
        mEpollRebuildRequired = false;
        rebuildEpollLocked();
        goto Done;
    }

   //請求出錯
    // Check for poll error.
    if (eventCount < 0) {
        if (errno == EINTR) {
            goto Done;
        }
        ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
        result = POLL_ERROR;
        goto Done;
    }

   //請求超時
    // Check for poll timeout.
    if (eventCount == 0) {
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - timeout", this);
#endif
        result = POLL_TIMEOUT;
        goto Done;
    }

    // Handle all events.
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
#endif
......
......

在隊列消息爲空的情況下，那麼我們就會阻塞在poll_inner的epoll_wait處，如果有新消息加入隊列，則上層會調用nativeWake，最終對應Looper中的wake函數將epoll_wait喚醒。


void Looper::wake() {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ wake", this);
#endif

    uint64_t inc = 1;
    //往mWakeEventFd中寫數據， mWakeEventFd監聽到有事件觸發，則使epoll_wait返回。
    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t)));
    if (nWrite != sizeof(uint64_t)) {
        if (errno != EAGAIN) {
            ALOGW("Could not write wake signal: %s", strerror(errno));
        }
    }
}

至此，Looper.cpp中實現隊列的阻塞喚醒的功能已經完成，總結一下：

Looper.cpp對象是在android_os_MessageQueue的nativeInit函數被調用時初始化
Looper.cpp的構造函數中，使用eventfd函數創建了一個mWakeEventFd的文件描述符用於事件通知，並使用epoll_create函數創建了一個epoll的文件描述符。然後調用epoll_ctl將mWakeEventFd加入到了事件監聽鏈中。
初始化的隊列爲空，請求數據的時候，首先會走到pollOnce中，傳入的timeoutMillis值爲0，則epoll_wait會立即返回，然後MessageQueueh會進行timeoutMills = -1的第二次pollOnce, 這時候就阻塞在epoll_wait函數處。
當有新數據加入隊列時，調用wake函數往mWakeEventFd中寫數據, 此時觸發epoll_wait阻塞中斷，pollOnce返回，上層MessageQueue中的next()函數也將執行完畢，並將Message丟給Handler處理

Android Handler消息隊列的實現原理

整體架構

數據結構

Mokee android 10.0代碼下載編譯總結

Android 9.0 AutotoMotive模塊之Vehicle

Android Handler消息隊列的實現原理

Android系統升級全流程

android4.0修改系統文字大小

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結