1、操作系統定時器
操作系統定時器是由rt thread內核提供的一個定時功能,支持硬件定時器或軟件定時器。最後在一個產品的使用了多個定時器來實現數據通信指示燈的功能,原理是創建一個週期定時器,創建一個單次定時器,當有數據通信時,啓動週期定時器來控制燈閃爍,啓動單次定時器來超時關閉燈和週期定時器。當有數據再次到來時重新啓動,達到了閃燈的效果。
2、定時器卡死的問題
這個定時器由2個不同優先級的線程調用,在程序經過長時間運行後,會出現定時器卡死,即程序一直運行在rt_timer_start中ffor (; row_head[row_lvl] != timer_list[row_lvl].prev;
row_head[row_lvl] = row_head[row_lvl]->next)循環中。
rt_err_t rt_timer_start(rt_timer_t timer)
{
unsigned int row_lvl;
rt_list_t *timer_list;
register rt_base_t level;
rt_list_t *row_head[RT_TIMER_SKIP_LIST_LEVEL];
unsigned int tst_nr;
static unsigned int random_nr;
/* timer check */
RT_ASSERT(timer != RT_NULL);
RT_ASSERT(rt_object_get_type(&timer->parent) == RT_Object_Class_Timer);
/* stop timer firstly */
level = rt_hw_interrupt_disable();
/* remove timer from list */
_rt_timer_remove(timer);
/* change status of timer */
timer->parent.flag &= ~RT_TIMER_FLAG_ACTIVATED;
RT_OBJECT_HOOK_CALL(rt_object_take_hook, (&(timer->parent)));
/*
* get timeout tick,
* the max timeout tick shall not great than RT_TICK_MAX/2
*/
RT_ASSERT(timer->init_tick < RT_TICK_MAX / 2);
timer->timeout_tick = rt_tick_get() + timer->init_tick;
#ifdef RT_USING_TIMER_SOFT
if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
{
/* insert timer to soft timer list */
timer_list = rt_soft_timer_list;
}
else
#endif
{
/* insert timer to system timer list */
timer_list = rt_timer_list;
}
row_head[0] = &timer_list[0];
for (row_lvl = 0; row_lvl < RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
{
for (; row_head[row_lvl] != timer_list[row_lvl].prev;
row_head[row_lvl] = row_head[row_lvl]->next)
{
struct rt_timer *t;
rt_list_t *p = row_head[row_lvl]->next;
/* fix up the entry pointer */
t = rt_list_entry(p, struct rt_timer, row[row_lvl]);
/* If we have two timers that timeout at the same time, it's
* preferred that the timer inserted early get called early.
* So insert the new timer to the end the the some-timeout timer
* list.
*/
if ((t->timeout_tick - timer->timeout_tick) == 0)
{
continue;
}
else if ((t->timeout_tick - timer->timeout_tick) < RT_TICK_MAX / 2)
{
break;
}
}
if (row_lvl != RT_TIMER_SKIP_LIST_LEVEL - 1)
row_head[row_lvl + 1] = row_head[row_lvl] + 1;
}
/* Interestingly, this super simple timer insert counter works very very
* well on distributing the list height uniformly. By means of "very very
* well", I mean it beats the randomness of timer->timeout_tick very easily
* (actually, the timeout_tick is not random and easy to be attacked). */
random_nr++;
tst_nr = random_nr;
rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - 1],
&(timer->row[RT_TIMER_SKIP_LIST_LEVEL - 1]));
for (row_lvl = 2; row_lvl <= RT_TIMER_SKIP_LIST_LEVEL; row_lvl++)
{
if (!(tst_nr & RT_TIMER_SKIP_LIST_MASK))
rt_list_insert_after(row_head[RT_TIMER_SKIP_LIST_LEVEL - row_lvl],
&(timer->row[RT_TIMER_SKIP_LIST_LEVEL - row_lvl]));
else
break;
/* Shift over the bits we have tested. Works well with 1 bit and 2
* bits. */
tst_nr >>= (RT_TIMER_SKIP_LIST_MASK + 1) >> 1;
}
timer->parent.flag |= RT_TIMER_FLAG_ACTIVATED;
/* enable interrupt */
rt_hw_interrupt_enable(level);
#ifdef RT_USING_TIMER_SOFT
if (timer->parent.flag & RT_TIMER_FLAG_SOFT_TIMER)
{
/* check whether timer thread is ready */
if ((soft_timer_status == RT_SOFT_TIMER_IDLE) &&
((timer_thread.stat & RT_THREAD_STAT_MASK) == RT_THREAD_SUSPEND))
{
/* resume timer thread to check soft timer */
rt_thread_resume(&timer_thread);
rt_schedule();
}
}
#endif
return RT_EOK;
}
查看此時的定時器鏈表,發現定時器鏈表rt_timer_list的最後一個節點指向了自己,變成了一個死鏈表,導致上面的for循環成爲死循環,無法退出,程序程序表現爲死機,沒有任務響應。
3、解決辦法
更新了4.0.3版本中的定時器rt_timer.c代碼,經過查看代碼與rt thread代碼github中的更新紀錄,分析是由於rt_timer_start函數在某種特別的情況下,被中斷調用,即一箇中斷或是一個線程未調用退出rt_timer_start函數,另外一個線程又調用了rt_timer_start函數,導致出現的死鏈表。
經過詳細的分析rt_timer.c的更新紀錄,找到https://github.com/RT-Thread/rt-thread/issues/3800, 關於硬件定時器的線程安全問題 #3800的問題,看了裏面的討論,確認了是由我的程序中有2個線程都會調用啓動一個定時器,並且rt_timer_start函數中有一行代碼未關中斷,導致執行被打斷導致。至此是真正的找到了問題。
總結一下,這個問題發生的條件是十分苛刻,即需要同時滿足兩個條件:
- 線程A的rt_timer_start被線程B打斷後,線程B又執行了rt_timer_start.
- 線程A的rt_timer_start和線程B的rt_timer_start操作的是同一個timer.
4、討論截圖
github上面的討論有時看不到圖片,順便截圖下來,留着以後查看。
5、總結
操作系統的線程安全真的很重要,有時發生函數線程不安全的話,在短時間的測試是無法發現問題,通過這個問題的修復對臨界區保護有了更深的認識,線程之間的搶佔也存在臨界區保護的問題。