rt thread 3.1.3版本操作系统定时器卡死的问题

1、操作系统定时器

       操作系统定时器是由rt thread内核提供的一个定时功能,支持硬件定时器或软件定时器。最后在一个产品的使用了多个定时器来实现数据通信指示灯的功能,原理是创建一个周期定时器,创建一个单次定时器,当有数据通信时,启动周期定时器来控制灯闪烁,启动单次定时器来超时关闭灯和周期定时器。当有数据再次到来时重新启动,达到了闪灯的效果。

2、定时器卡死的问题

     这个定时器由2个不同优先级的线程调用,在程序经过长时间运行后,会出现定时器卡死,即程序一直运行在rt_timer_start中ffor (; row_head[row_lvl] != timer_list[row_lvl].prev;
             row_head[row_lvl]  = row_head[row_lvl]->next)循环中。

rt_err_t rt_timer_start(rt_timer_t timer)
{
    unsigned int row_lvl;
    rt_list_t *timer_list;
    register rt_base_t level;
    rt_list_t *row_head[RT_TIMER_SKIP_LIST_LEVEL];
    unsigned int tst_nr;
    static unsigned int random_nr;

    /* timer check */
    RT_ASSERT(timer != RT_NULL);
    RT_ASSERT(rt_object_get_type(&timer->parent) == RT_Object_Class_Timer);

    /* stop timer firstly */
    level = rt_hw_interrupt_disable();
    /* remove timer from list */
    _rt_timer_remove(timer);
    /* change status of timer */
    timer->parent.flag &= ~RT_TIMER_FLAG_ACTIVATED;

    RT_OBJECT_HOOK_CALL(rt_object_take_hook, (&(timer->parent)));

    /*
     * get timeout tick,
     * the max timeout tick shall not great than RT_TICK_MAX/2
     */
    RT_ASSERT(timer->init_tick < RT_TICK_MAX / 2);
    timer->timeout_tick = rt_tick_get() + timer->init_tick;

#ifdef RT_USING_TIMER_SOFT
    if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
    {
        /* insert timer to soft timer list */
        timer_list = rt_soft_timer_list;
    }
    else
#endif
    {
        /* insert timer to system timer list */
        timer_list = rt_timer_list;
    }

    row_head[0]  = &timer_list[0];
    for (row_lvl = 0; row_lvl < RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
    {
        for (; row_head[row_lvl] != timer_list[row_lvl].prev;
             row_head[row_lvl]  = row_head[row_lvl]->next)
        {
            struct rt_timer *t;
            rt_list_t *p = row_head[row_lvl]->next;

            /* fix up the entry pointer */
            t = rt_list_entry(p, struct rt_timer, row[row_lvl]);

            /* If we have two timers that timeout at the same time, it's
             * preferred that the timer inserted early get called early.
             * So insert the new timer to the end the the some-timeout timer
             * list.
             */
            if ((t->timeout_tick - timer->timeout_tick) == 0)
            {
                continue;
            }
            else if ((t->timeout_tick - timer->timeout_tick) < RT_TICK_MAX / 2)
            {
                break;
            }
        }
        if (row_lvl != RT_TIMER_SKIP_LIST_LEVEL - 1)
            row_head[row_lvl + 1] = row_head[row_lvl] + 1;
    }

    /* Interestingly, this super simple timer insert counter works very very
     * well on distributing the list height uniformly. By means of "very very
     * well", I mean it beats the randomness of timer->timeout_tick very easily
     * (actually, the timeout_tick is not random and easy to be attacked). */
    random_nr++;
    tst_nr = random_nr;

    rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - 1],
                         &(timer->row[RT_TIMER_SKIP_LIST_LEVEL - 1]));
    for (row_lvl = 2; row_lvl <= RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
    {
        if (!(tst_nr & RT_TIMER_SKIP_LIST_MASK))
            rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - row_lvl],
                                 &(timer->row[RT_TIMER_SKIP_LIST_LEVEL - row_lvl]));
        else
            break;
        /* Shift over the bits we have tested. Works well with 1 bit and 2
         * bits. */
        tst_nr >>= (RT_TIMER_SKIP_LIST_MASK + 1) >> 1;
    }

    timer->parent.flag |= RT_TIMER_FLAG_ACTIVATED;

    /* enable interrupt */
    rt_hw_interrupt_enable(level);

#ifdef RT_USING_TIMER_SOFT
    if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
    {
        /* check whether timer thread is ready */
        if ((soft_timer_status == RT_SOFT_TIMER_IDLE) &&
           ((timer_thread.stat & RT_THREAD_STAT_MASK) == RT_THREAD_SUSPEND))
        {
            /* resume timer thread to check soft timer */
            rt_thread_resume(&timer_thread);
            rt_schedule();
        }
    }
#endif

    return RT_EOK;
}

     查看此时的定时器链表,发现定时器链表rt_timer_list的最后一个节点指向了自己,变成了一个死链表,导致上面的for循环成为死循环,无法退出,程序程序表现为死机,没有任务响应。

3、解决办法

       更新了4.0.3版本中的定时器rt_timer.c代码,经过查看代码与rt thread代码github中的更新纪录,分析是由于rt_timer_start函数在某种特别的情况下,被中断调用,即一个中断或是一个线程未调用退出rt_timer_start函数,另外一个线程又调用了rt_timer_start函数,导致出现的死链表。

        经过详细的分析rt_timer.c的更新纪录,找到https://github.com/RT-Thread/rt-thread/issues/3800, 关于硬件定时器的线程安全问题 #3800的问题,看了里面的讨论,确认了是由我的程序中有2个线程都会调用启动一个定时器,并且rt_timer_start函数中有一行代码未关中断,导致执行被打断导致。至此是真正的找到了问题。

        总结一下,这个问题发生的条件是十分苛刻,即需要同时满足两个条件:

  • 线程A的rt_timer_start被线程B打断后,线程B又执行了rt_timer_start.
  • 线程A的rt_timer_start和线程B的rt_timer_start操作的是同一个timer.

4、讨论截图

         github上面的讨论有时看不到图片,顺便截图下来,留着以后查看。

alt text

 

alt text

image

5、总结

      操作系统的线程安全真的很重要,有时发生函数线程不安全的话,在短时间的测试是无法发现问题,通过这个问题的修复对临界区保护有了更深的认识,线程之间的抢占也存在临界区保护的问题。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章