linux 內核筆記之watchdog

watchdog

簡而言之，watchdog是爲了保證系統正常運行，或者從死循環，死鎖等一場狀態退出的一種機制。

看門狗分硬件看門狗和軟件看門狗。硬件看門狗是利用一個定時器電路，其定時輸出連接到電路的復位端，程序在一定時間範圍內對定時器清零(俗稱“喂狗”)，因此程序正常工作時，定時器總不能溢出，也就不能產生復位信號。如果程序出現故障，不在定時週期內復位看門狗，就使得看門狗定時器溢出產生復位信號並重啓系統。軟件看門狗原理上一樣，只是將硬件電路上的定時器用處理器的內部定時器代替，這樣可以簡化硬件電路設計，但在可靠性方面不如硬件定時器，比如系統內部定時器自身發生故障就無法檢測到。

軟件看門狗分爲兩種，用於檢測soft lockup的普通軟狗(基於時鐘中斷)，以及檢測hard lockup的NMI狗（基於NMI中斷）。

注1：時鐘中斷優先級小於NMI中斷
注2：lockup，是指某段內核代碼佔着CPU不放。Lockup嚴重的情況下會導致整個系統失去響應。
soft lockup 和 hard lockup，它們的唯一區別是 hard lockup 發生在CPU屏蔽中斷的情況下。

軟狗

單個cpu檢測線程是否正常調度。

一般軟狗的正常流程如下（假設軟狗觸發的時間爲20s）

可能產生軟狗的原因：
1.頻繁處理硬中斷以至於沒有時間正常調度
2.長期處理軟中斷
3.對於非搶佔式內核，某個線程長時間執行而不觸發調度
4.以上all

NMI watchdog

單個CPU檢測中斷是否能夠正常上報
當CPU處於關中斷狀態達到一定時間會被判定進入hard lockup

NMI檢測流程：

可能產生NMI狗的原因：
1.長期處理某個硬中斷
2.長時間在禁用本地中斷下處理

NMI狗機制也是用一個percpu的hrtimer來喂狗，爲了能夠及時檢測到hard lockup狀態，在比中斷優先級更高的NMI上下文進行檢測。

硬狗

用於檢測所有CPU是否正常運行
任何一個CPU都可以喂硬狗，當在一定時間內沒有核喂狗，觸發硬狗復位

硬狗檢測流程：

可能產生硬狗的原因：
1.CPU（沒有軟狗，NMI狗觸發條件）全部掛死
2.CPU之間存在硬件依賴關係，某一個CPU掛死，有軟件層面的共享資源

基於內核代碼watchdog.c分析soft lockup以及hard lockup的實現機制（kernel/watchdog.c）

soft lockup

每一個CPU上都有一個watchdog線程（線程名爲watchdog/0,watchdog/1 …）

static struct smp_hotplug_thread watchdog_threads = {
    .store          = &softlockup_watchdog,
    .thread_should_run  = watchdog_should_run,
    .thread_fn      = watchdog,
    .thread_comm        = "watchdog/%u",
    .setup          = watchdog_enable,
    .park           = watchdog_disable,
    .unpark         = watchdog_enble,
};

2.該線程定期調用watchdog函數

static void __touch_watchdog(void)
{   
    /*更新watchdog運行時間戳*/
    __this_cpu_write(watchdog_touch_ts, get_timestamp());
}

static void watchdog(unsigned int cpu)
{
    /*更新softlock hrtimer cnt = hrtimer interrupts*/
    __this_cpu_write(soft_lockup_hrtimer_cnt,
             __this_cpu_read(hrtimer_interrupts));
    __touch_watchdog();
}

3.時間中斷

static void watchdog_enable(unsigned int cpu)
{
    struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);

    /* kick off the timer for the hardlockup detector */
    hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
    hrtimer->function = watchdog_timer_fn;

    /* done here because hrtimer_start can only pin to smp_processor_id() */
    hrtimer_start(hrtimer, ns_to_ktime(sample_period),
              HRTIMER_MODE_REL_PINNED);
}
}

該函數主要功能就是初始化一個高精度timer，喚醒watchdog 喂狗線程。
hrtimer的時間處理函數爲：

static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
    //watchdog上次運行的時間戳
    unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
    struct pt_regs *regs = get_irq_regs();
    int duration;
    //在喚醒watchdog kthread之前遞增hrtimer_interrupts，保證kthread更新其時間戳
    watchdog_interrupt_count();
    //喚醒watchdog kthread，保證kthread與timer相同的運行頻率
    wake_up_process(__this_cpu_read(softlockup_watchdog));
    //再次調度hrtimer下一個週期運行
    hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

    ...

    //檢測是否發生soft lockup
    duration = is_softlockup(touch_ts);
    if (unlikely(duration)) {
        printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
            smp_processor_id(), duration,
            current->comm, task_pid_nr(current));
        print_modules();
        print_irqtrace_events(current);
        //dump 寄存器和堆棧
        if (regs)
            show_regs(regs);
        else
            dump_stack();

        if (softlockup_panic)
            panic("softlockup: hung tasks");
    } 
    return HRTIMER_RESTART;
}
//檢查搶佔被關閉的時間間隔
//watchdog kthread在watchdog timer的中斷上下文中被喚醒，
//當中斷退出時，kthread會搶佔cpu上的當前進程。如果
//搶佔被關閉的話，則不會發生搶佔，watchdog便無法更新時
//間戳，當搶佔關閉的時間超過閾值時，核心認爲發生了soft
//lock up。
//注：soft lockup閾值 watchdog_thresh * 2 (20s)
3.2 static int is_softlockup(unsigned long touch_ts)
{
    //當前時間戳
    unsigned long now = get_timestamp(smp_processor_id());
    //watchdog在 watchdog_thresh * 2 時間內未被調度過
    if (time_after(now, touch_ts + get_softlockup_thresh()))
        return now - touch_ts;

    return 0;
}

函數主要任務：
(1)獲取watchdog上次運行的時間戳
(2)遞增watchdog timer運行次數
(3)檢查watchdog時間戳，是否發生了soft lockup(如果發生了，dump堆棧，打印信息)
(4)重調度timer

lockup 檢測函數：

static int is_softlockup(unsigned long touch_ts)
{
    //當前時間戳
    unsigned long now = get_timestamp(smp_processor_id());
    //watchdog在 watchdog_thresh * 2 時間內未被調度過
    if (time_after(now, touch_ts + get_softlockup_thresh()))
        return now - touch_ts;

    return 0;
}

hard lockup

hard lock主要在NMI中斷中就行檢測
1.初始化並使能hard lockup檢測

static int watchdog_nmi_enable(unsigned int cpu)
{
    //hard lockup事件
    struct perf_event_attr *wd_attr;
    struct perf_event *event = per_cpu(watchdog_ev, cpu);
    ....
    wd_attr = &wd_hw_attr;
    //hard lockup檢測週期，10s
    wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
    //向performance monitoring註冊hard lockup檢測事件
    event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
    ....
    //使能hard lockup的檢測
    per_cpu(watchdog_ev, cpu) = event;
    perf_event_enable(per_cpu(watchdog_ev, cpu));
    return 0;
}

perf_event_create_kernel_counter函數主要是註冊了一個硬件的事件。
這個硬件在x86裏叫performance monitoring，這個硬件有一個功能就是在cpu clock經過了多少個週期後發出一個NMI中斷出來。

2.當cpu全負荷跑完20秒後，就會有一個NMI中斷髮出，對應watchdog_overflow_callback。

static void watchdog_overflow_callback(struct perf_event *event,
         struct perf_sample_data *data,
         struct pt_regs *regs)
{
    //判斷是否發生hard lockup
    if (is_hardlockup()) {
        int this_cpu = smp_processor_id();

        //打印hard lockup信息
        if (hardlockup_panic)
            panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
        else
            WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);

        return;
    }
    return;
}

檢測是否有hard lockup

static int is_hardlockup(void)
{
    //獲取watchdog timer的運行次數
    unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
    //在一個hard lockup檢測時間閾值內，如果watchdog timer未運行，說明cpu中斷被屏蔽時間超過閾值
    if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
        return 1;
    //記錄watchdog timer運行的次數
    __this_cpu_write(hrtimer_interrupts_saved, hrint);
    return 0;
}

關閉hard lockup檢測

static void watchdog_nmi_disable(unsigned int cpu)
{
    struct perf_event *event = per_cpu(watchdog_ev, cpu);
    if (event) {
        //向performance monitoring子系統註銷hard lockup檢測控制塊
        perf_event_disable(event);
        //清空per-cpu hard lockup檢測控制塊
        per_cpu(watchdog_ev, cpu) = NULL;
        //釋放hard lock檢測控制塊
        perf_event_release_kernel(event);
    }
    return;
}

linux 內核筆記之watchdog

watchdog

軟狗

NMI watchdog

硬狗

soft lockup

hard lockup

PCIe學習筆記之Max payload size

linux內核筆記之SMMU代碼分析

ARM SMMU學習筆記

PCIe學習筆記之MSI/MSI-x中斷及代碼分析

PCIe學習筆記之pcie結構和配置空間

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結