Mysql5.7 的錯誤日誌中最常見的note級別日誌解釋

在使用mysql5.7的時候,發現了不少在mysql5.6上不曾見過的日誌,級別爲note, 作者梳理了一下,最常見的note日誌以下三種,下面我們來逐個解釋。


第一種,Aborted connection . 如上圖,信息如下:

2016-03-17T14:44:24.102542Z 59 [Note] Aborted connection 59 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx’ (Got an error reading communication packets)

2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)

2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)

2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got  timeout  reading communication packets)

上面兩個是連接中斷的錯誤信息,原因不一樣,Got an error reading communication packets  的原因是因爲網絡等原因導致。 Got  timeout  reading communication packets 這個錯誤的原因是會話的idle時間達到了數據庫指定的timeout時間。

第二種:SLAVE多線程同步的信息


如圖,信息如下:

2016-03-14T15:48:26.432150Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000

2016-03-14T15:50:28.398745Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000

我們通過源代碼,找到下面一段,該段實現了上述日誌的輸出。

  if ((my_now – rli->mts_last_online_stat)>=

           mts_online_stat_period)

        {

         sql_print_information(“Multi-threadedslave statistics%s: “

                                “seconds elapsed = %lu; “

                                “events assigned = %llu; “

                                “worker queues filled over overrun level = %lu;”

                                “waited due a Worker queue full = %lu; “

                                “waited due the total size = %lu; “

                                “waited at clock conflicts = %llu “

                               “waited(count) when Workers occupied = %lu “

                                “waited when Workers occupied = %llu”,

                                rli->get_for_channel_str(),

                                static_cast<unsignedlong>

                                (my_now – rli->mts_last_online_stat),

                                rli->mts_events_assigned,

                                rli->mts_wq_overrun_cnt,

                                rli->mts_wq_overfill_cnt,

                                rli->wq_size_waits_cnt,

                                rli->mts_total_wait_overlap,

                                rli->mts_wq_no_underrun_cnt,

                                rli->mts_total_wait_worker_avail);

          rli->mts_last_online_stat=my_now;   

 通過這一句(my_now – rli->mts_last_online_stat),  以及最後一句rli->mts_last_online_stat=my_now;   可以得知, seconds elapsed 就是上一次統計跟這一次統計的時間間隔。

mts_online_stat_period =120秒,硬代碼,因此就是幾乎每隔120秒,就有上述日誌的輸出。 通過進一步查看原代碼,初步瞭解上述日誌信息的含義,如下:

   

events assigned:總共有多少個event被分配執行,計的是總數。

worker queues filled over overrun level多線程同步中,worker 的私有隊列長度超長的次數計的是總數。

waited due a Worker queue full 因爲worker的隊列超長而產生等待的次數計的是總數。

waited due the total size 超過最大size的次數這個由參數slave_pending_jobs_size_max  指定。

waited at clock conflicts 因爲邏輯時間產生衝突的等待時間單位是納秒。

waited (count) when Workers occupied 因爲workder被佔用而出現等待的次數。(總計值)。

waited when Workersoccupied :因爲workder被佔用而出現等待的總時間,總計值,單位是納秒。


第三種:page_cleaner線程的輸出日誌



如圖,信息如下:


2016-03-24T17:45:28.005117Z 0 [Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms.The settings might not beoptimal. (flushed=1519 and evicted=0, during the time.)


查找源代碼,發現是上面的日誌由下面一段代碼輸出:

        if (ret_sleep == OS_SYNC_TIME_EXCEEDED) {

            ulint   curr_time = ut_time_ms();

 

            if (curr_time >next_loop_time + 3000) {

                if (warn_count == 0) {

                    ib::info() << “page_cleaner: 1000ms”

                        ” intended loop took “

                        <<1000 + curr_time

                           – next_loop_time

                        <<“ms. The settings might not”

                        ” be optimal. (flushed=”

                        <<n_flushed_last

                        <<” and evicted=”

                        <<n_evicted

                        <<“, during the time.)”;

                    if (warn_interval >300) {

                        warn_interval= 600;

                    }else {

                        warn_interval*= 2;

                    }

 

                    warn_count= warn_interval;

                } else {

                    –warn_count;

                }

            } else {

                /* reset counter */

                warn_interval= 1;

                warn_count= 0;

            }

 

            next_loop_time= curr_time + 1000;

            n_flushed_last= n_evicted = 0;

        }



通過分析源代碼, 輸出上述日誌的條件是curr_time> next_loop_time + 3000 ,即比原定的循環時間next_loop_time3000毫秒,而next_loop_time的標準時間是1000毫秒,即1秒鐘做一次刷新頁的操作。

loop took 4750ms ,即是這次刷新循環的實際經歷時間。

 

後面還有一個(flushed=1519 and evicted=0,during the time.)這樣的日誌,即對應n_flushed_lastn_evicted 變量,而這兩個變量又由n_flushed_list與n_flushed_lru賦值。

 

./storage/innobase/buf/buf0flu.cc:3322:                 n_flushed_last +=n_flushed_list;

./storage/innobase/buf/buf0flu.cc:3321:                 n_evicted += n_flushed_lru;

 

 

而n_flushed_list與n_flushed_lru賦值函數爲pc_wait_finished,如下,我們來看看該函數的註釋。

pc_wait_finished(&n_flushed_lru,&n_flushed_list);

 

 

/**

Wait until all flush requests are finished.

@param n_flushed_lru    numberof pages flushed from the end of the LRU list.

@param n_flushed_list   numberof pages flushed from the end of the

            flush_list.

@return         trueif all flush_list flushing batch were success. */

static

bool

pc_wait_finished(

    ulint*  n_flushed_lru,

    ulint*  n_flushed_list)

{

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

}

通過源代碼的註釋,我們獲知如下信息:

n_flushed_lru   number of pages flushed from the end of the LRU list.

n_flushed_lru  這個值表示從lru 列表尾部刷新的頁數,也就是日誌中如evicted=0 指標的所表示的值,如果該值不爲零,則存在innodb buffer不夠的嫌疑。

n_flushed_list  這個是從刷新列表中刷新的頁數,也就是髒頁數,也就是日誌中flushed=1519 的值。

發佈了170 篇原創文章 · 獲贊 75 · 訪問量 63萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章