rt thread 3.1.3版本操作系統定時器卡死的問題

1、操作系統定時器

       操作系統定時器是由rt thread內核提供的一個定時功能,支持硬件定時器或軟件定時器。最後在一個產品的使用了多個定時器來實現數據通信指示燈的功能,原理是創建一個週期定時器,創建一個單次定時器,當有數據通信時,啓動週期定時器來控制燈閃爍,啓動單次定時器來超時關閉燈和週期定時器。當有數據再次到來時重新啓動,達到了閃燈的效果。

2、定時器卡死的問題

     這個定時器由2個不同優先級的線程調用,在程序經過長時間運行後,會出現定時器卡死,即程序一直運行在rt_timer_start中ffor (; row_head[row_lvl] != timer_list[row_lvl].prev;
             row_head[row_lvl]  = row_head[row_lvl]->next)循環中。

rt_err_t rt_timer_start(rt_timer_t timer)
{
    unsigned int row_lvl;
    rt_list_t *timer_list;
    register rt_base_t level;
    rt_list_t *row_head[RT_TIMER_SKIP_LIST_LEVEL];
    unsigned int tst_nr;
    static unsigned int random_nr;

    /* timer check */
    RT_ASSERT(timer != RT_NULL);
    RT_ASSERT(rt_object_get_type(&timer->parent) == RT_Object_Class_Timer);

    /* stop timer firstly */
    level = rt_hw_interrupt_disable();
    /* remove timer from list */
    _rt_timer_remove(timer);
    /* change status of timer */
    timer->parent.flag &= ~RT_TIMER_FLAG_ACTIVATED;

    RT_OBJECT_HOOK_CALL(rt_object_take_hook, (&(timer->parent)));

    /*
     * get timeout tick,
     * the max timeout tick shall not great than RT_TICK_MAX/2
     */
    RT_ASSERT(timer->init_tick < RT_TICK_MAX / 2);
    timer->timeout_tick = rt_tick_get() + timer->init_tick;

#ifdef RT_USING_TIMER_SOFT
    if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
    {
        /* insert timer to soft timer list */
        timer_list = rt_soft_timer_list;
    }
    else
#endif
    {
        /* insert timer to system timer list */
        timer_list = rt_timer_list;
    }

    row_head[0]  = &timer_list[0];
    for (row_lvl = 0; row_lvl < RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
    {
        for (; row_head[row_lvl] != timer_list[row_lvl].prev;
             row_head[row_lvl]  = row_head[row_lvl]->next)
        {
            struct rt_timer *t;
            rt_list_t *p = row_head[row_lvl]->next;

            /* fix up the entry pointer */
            t = rt_list_entry(p, struct rt_timer, row[row_lvl]);

            /* If we have two timers that timeout at the same time, it's
             * preferred that the timer inserted early get called early.
             * So insert the new timer to the end the the some-timeout timer
             * list.
             */
            if ((t->timeout_tick - timer->timeout_tick) == 0)
            {
                continue;
            }
            else if ((t->timeout_tick - timer->timeout_tick) < RT_TICK_MAX / 2)
            {
                break;
            }
        }
        if (row_lvl != RT_TIMER_SKIP_LIST_LEVEL - 1)
            row_head[row_lvl + 1] = row_head[row_lvl] + 1;
    }

    /* Interestingly, this super simple timer insert counter works very very
     * well on distributing the list height uniformly. By means of "very very
     * well", I mean it beats the randomness of timer->timeout_tick very easily
     * (actually, the timeout_tick is not random and easy to be attacked). */
    random_nr++;
    tst_nr = random_nr;

    rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - 1],
                         &(timer->row[RT_TIMER_SKIP_LIST_LEVEL - 1]));
    for (row_lvl = 2; row_lvl <= RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
    {
        if (!(tst_nr & RT_TIMER_SKIP_LIST_MASK))
            rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - row_lvl],
                                 &(timer->row[RT_TIMER_SKIP_LIST_LEVEL - row_lvl]));
        else
            break;
        /* Shift over the bits we have tested. Works well with 1 bit and 2
         * bits. */
        tst_nr >>= (RT_TIMER_SKIP_LIST_MASK + 1) >> 1;
    }

    timer->parent.flag |= RT_TIMER_FLAG_ACTIVATED;

    /* enable interrupt */
    rt_hw_interrupt_enable(level);

#ifdef RT_USING_TIMER_SOFT
    if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
    {
        /* check whether timer thread is ready */
        if ((soft_timer_status == RT_SOFT_TIMER_IDLE) &&
           ((timer_thread.stat & RT_THREAD_STAT_MASK) == RT_THREAD_SUSPEND))
        {
            /* resume timer thread to check soft timer */
            rt_thread_resume(&timer_thread);
            rt_schedule();
        }
    }
#endif

    return RT_EOK;
}

     查看此時的定時器鏈表,發現定時器鏈表rt_timer_list的最後一個節點指向了自己,變成了一個死鏈表,導致上面的for循環成爲死循環,無法退出,程序程序表現爲死機,沒有任務響應。

3、解決辦法

       更新了4.0.3版本中的定時器rt_timer.c代碼,經過查看代碼與rt thread代碼github中的更新紀錄,分析是由於rt_timer_start函數在某種特別的情況下,被中斷調用,即一箇中斷或是一個線程未調用退出rt_timer_start函數,另外一個線程又調用了rt_timer_start函數,導致出現的死鏈表。

        經過詳細的分析rt_timer.c的更新紀錄,找到https://github.com/RT-Thread/rt-thread/issues/3800, 關於硬件定時器的線程安全問題 #3800的問題,看了裏面的討論,確認了是由我的程序中有2個線程都會調用啓動一個定時器,並且rt_timer_start函數中有一行代碼未關中斷,導致執行被打斷導致。至此是真正的找到了問題。

        總結一下,這個問題發生的條件是十分苛刻,即需要同時滿足兩個條件:

  • 線程A的rt_timer_start被線程B打斷後,線程B又執行了rt_timer_start.
  • 線程A的rt_timer_start和線程B的rt_timer_start操作的是同一個timer.

4、討論截圖

         github上面的討論有時看不到圖片,順便截圖下來,留着以後查看。

alt text

 

alt text

image

5、總結

      操作系統的線程安全真的很重要,有時發生函數線程不安全的話,在短時間的測試是無法發現問題,通過這個問題的修復對臨界區保護有了更深的認識,線程之間的搶佔也存在臨界區保護的問題。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章