MySQL進階墊腳石:線程長時間處於killed狀態怎麼破?

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL中使用kill命令去殺死連接時,如果使用show processlist會發現線程會處於killed狀態一段時間,而不是立即殺掉。一些情況下,killed狀態可能會存在很久,甚至可能會一直存在直到發送第二次kill命令才能殺掉連接。下面從MySQL執行kill命令代碼流程(基於5.7版本的MySQL)簡單分析下出現這種現象的原因。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、源碼分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、MySQL執行流程簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL的啓動入口函數爲mysqld中的main函數,主要流程會啓動一個線程去listen端口,accept tcp連接,並創建一個connection並與具體的線程綁定,去處理來自客戶端的消息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"執行流程:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/33\/335e3f401e31705e08dec0edc3a9849c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日常執行kill流程,一般是通過mysql命令行客戶端新起一個連接,通過show processlist找到需要kill掉的連接的conncetion_id,然後發送kill命令。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:kill + 連接id 默認是kill connection,代表斷開連接,如果是kill query + 連接id則只是終止本次執行的語句,連接還會繼續監聽來自client的命令。(具體執行區別可以參考下面KILL工作流程1中(1)、(2)部分)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、KILL工作流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"概念:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection: socket連接,默認有一個max_connection,實際上可以接受max_connection + 1個連接,最後一個連接是預留給SUPER用戶的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"pthread: mysql的工作線程,每個connection建立時都會分配一個pthread,connection斷開後pthread仍舊可以被其他connection複用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"thd: 線程描述類,每個connection對應一個thd,其中包含很多連接狀態的描述,其中thd->killed字段標識線程是否需要被kill。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲方便說明,假設有兩個連接connection1, connection2, 對應上述流程,則是connection1在do_command或者listen socket event中時,通過connection2發送kill命令,中斷connection1的執行流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kill connection之後,對應此連接的pthread可能會被新連接複用(具體後面會分析)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)執行kill命令的線程發起kill"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以connection2的執行流程來分析kill的執行過程,跟蹤do_command之後的代碼堆棧可以看到:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n* frame #0: 0x00000001068a8853 mysqld`THD::awake(this=0x00007fbede88b400, state_to_set=KILL_CONNECTION) at sql_class.cc:2029:27\nframe #1: 0x000000010695961f mysqld`kill_one_thread(thd=0x00007fbed6bc9c00, id=2, only_kill_query=false) at sql_parse.cc:6479:14\nframe #2: 0x0000000106946529 mysqld`sql_kill(thd=0x00007fbed6bc9c00, id=2, only_kill_query=false) at sql_parse.cc:6507:16\nframe #3: 0x000000010694e0fa mysqld`mysql_execute_command(thd=0x00007fbed6bc9c00, first_level=true) at sql_parse.cc:4210:5\nframe #4: 0x0000000106945d62 mysqld`mysql_parse(thd=0x00007fbed6bc9c00, parser_state=0x000070000de2f340) at sql_parse.cc:5584:20\nframe #5: 0x0000000106942bf0 mysqld`dispatch_command(thd=0x00007fbed6bc9c00, com_data=0x000070000de2fe78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #6: 0x0000000106944e70 mysqld`do_command(thd=0x00007fbed6bc9c00) at sql_parse.cc:1032:17 \nframe #7: 0x0000000106ad3976 mysqld`::handle_connection(arg=0x00007fbee220b8d0) at\nconnection_handler_per_thread.cc:313:13\nframe #8: 0x000000010749e74c mysqld`::pfs_spawn_thread(arg=0x00007fbee15dcf90) at pfs.cc:2197:3 \nframe #9: 0x00007fff734b6109 libsystem_pthread.dylib`_pthread_start + 148\nframe #10: 0x00007fff734b1b8b libsystem_pthread.dylib`thread_start + 15"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心代碼爲awake函數(爲方便,分爲3段分析):"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ① 設置線程killed flag狀態"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nif (this->m_server_idle && state_to_set == KILL_QUERY)\n { \/* nothing *\/ } \n else\n {\n killed= state_to_set;\n }"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果當前線程處於idle狀態(代表命令已執行完),而且kill級別只是終止查詢,而不是kill整個連接,那麼不會去設置thd->killed狀態,防止影響下一次正常的請求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(認爲需要被kill的查詢已經執行結束了,不需要再做操作了)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ② 關閉socket連接&中斷引擎等待"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nif (state_to_set != THD::KILL_QUERY && state_to_set != THD::KILL_TIMEOUT)\n {\n if (this != current_thd)\n {\n\n shutdown_active_vio();\n }\n\n \/* Send an event to the scheduler that a thread should be killed. *\/ \n if (!slave_thread)\n MySQL_CALLBACK(Connection_handler_manager::event_functions, post_kill_notification, (this)); \/\/post_kill\n }\n\nif (state_to_set != THD::NOT_KILLED) \n ha_kill_connection(this);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之後會首先關閉socket連接(注如果是kill query,則不會關閉連接)不再接收新的命令。客戶端報下面這個錯就是在這步之後:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/15\/15bd60b3197b1c7d9976be8eede23d7a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外會執行ha_close_connection,這裏實際是將處於innodb層等待狀態的線程喚醒,具體代碼在ha_innodb.cc中innobase_kill_connection裏會調用lock_trx_handle_wait方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"trx: 一個mysql線程對應的innodb的事務描述類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ③ 通過信號量通知處於wait狀態的線程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nif (is_killable)\n {\n mysql_mutex_lock(&LOCK_current_cond); \n if (current_cond && current_mutex)\n {\n DBUG_EXECUTE_IF(\"before_dump_thread_acquires_current_mutex\",\n {\n const char act[]=\n \"now signal dump_thread_signal wait_for go_dump_thread\"; \n DBUG_ASSERT(!debug_sync_set_action(current_thd,\n STRING_WITH_LEN(act)));\n };);\n mysql_mutex_lock(current_mutex); mysql_cond_broadcast(current_cond); mysql_mutex_unlock(current_mutex);\n }\n mysql_mutex_unlock(&LOCK_current_cond);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏看到除了設置connection1的thd->killed狀態外,還會獲取current_mutex鎖,喚醒wait條件變量current_cond的線程(connection2)。注意上述②和③中喚醒的對象不同:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ha_close_connection喚醒的是本次對應的innodb事務中的鎖(trx->lock.wait_lock),對應的一般是在innodb層事務中等待的某個行鎖。mysql_cond_broadcast(current_cond)則是喚醒thd中的鎖,等待鎖是通過THD::enter_cond()進入(如open table時獲取表鎖,或者sleep等)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體可參考下面本地debug復現部分的分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼在發送信號量之前先關閉socket連接?"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不關閉socket連接,併發情況下有什麼問題?代碼中提及了一種case,假設connection1運行已經過了主動檢查flag的點,之後connection2調用awake設置flag及發送信號量喚醒,然後connection1進入到socket read中,那麼相當於這次信號量會丟失,connection1就會一直阻塞在read中,所以需要關閉socket 連接中斷read。BUG#37780"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相當於是通過io中斷解決信號量丟失的情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以如果connection1在其他階段發生信號量丟失(如connection2先broadcast,connection1再wait),就需要發送第二次kill命令才能喚醒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(sql_class.cc 2090,但是注意KILL_CONNECTION是不會重複進入awake的)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注: 一般出現這種情況是,connection2修改了killed狀態,但是由於cpu緩存一致性等問題,connection1看不到killed狀態,然後通過了主動檢查點,進入了wait狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)被kill線程響應kill命令"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"被kill線程感知(響應)kill命令主要有兩種方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主動檢查:connection1在一些代碼處會去主動檢查killed狀態;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"被通過信號量喚醒:connection1在執行某些命令時(如引擎層去做一些操作),會主動去await一個condition,釋放掉相應的鎖,connection2執行kill命令時,會通過鎖和condition喚醒connection1,執行終止操作;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"killed狀態:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nenum killed_state\n {\n NOT_KILLED=0, KILL_BAD_DATA=1,\n KILL_CONNECTION=ER_SERVER_SHUTDOWN, KILL_QUERY=ER_QUERY_INTERRUPTED, KILL_TIMEOUT=ER_QUERY_TIMEOUT,\n KILLED_NO_VALUE \/* means neither of the states *\/\n };"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3)connection真正被kill掉"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"真正被kill指的是show processlist看不到這個線程的時機。mysql在新建一個connection之後,會不斷的去監聽連接(do_command),前面提到kill時會主動把連接的socket關閉(shutdown_active_vio)。所以真正連接終止的邏輯是在此處,判斷thd_connection_alive的狀態是待殺死之後,會關閉connection,並且release_resources,此時再去show processlist,則killed的線程纔會消失。相應的pthread也會等待其他連接複用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"killed狀態:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nif (thd_prepare_connection(thd)) \n handler_manager->inc_aborted_connects();\n else\n {\n while (thd_connection_alive(thd))\n {\n if (do_command(thd)) break;\n }\n end_connection(thd);\n }\n close_connection(thd, 0, false, false);\n\n thd->get_stmt_da()->reset_diagnostics_area(); \n thd->release_resources();\n\n \/\/ Clean up errors now, before possibly waiting for a new connection. \n#if OPENSSL_VERSION_NUMBER < 0x10100000L\n ERR_remove_thread_state(0);\n#endif \/* OPENSSL_VERSION_NUMBER < 0x10100000L *\/\n\n thd_manager->remove_thd(thd); \n Connection_handler_manager::dec_connection_count();\n\n........\nchannel_info= Per_thread_connection_handler::block_until_new_connection(); \n\/\/等待新連接複用線程"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、被KILL線程主動檢查點和喚醒機制分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由前面kill過程分析,大致可以分爲兩種情況,一種是connection1代碼一直在執行中(佔據cpu),那麼總會執行到某些地方可以檢查thd->killed狀態,另外一種是connection1線程wait狀態,需要其他線程通過信號量喚醒connection1的線程,實現kill中斷目的。具體地,這兩類又可以分爲下面4種情況:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)第一類:主動檢查斷點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ① connection2發送kill命令時,connection1已執行完命令 (主動檢查)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時connection1阻塞在socket_read上,由於前面提到connection2會去shutdown_active_vio,connection1很容易感知到,執行後續操作,如回滾等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nif (thd_prepare_connection(thd)) \n handler_manager->inc_aborted_connects();\n else\n {\n while (thd_connection_alive(thd))\n {\n if (do_command(thd)) \n break;\n }\n end_connection(thd);\n }\n close_connection(thd, 0, false, false);\n\n thd->get_stmt_da()->reset_diagnostics_area(); \n thd->release_resources(); \/\/clean_up"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"②內部innodb引擎在獲取記錄時,也會去判斷thd->killed狀態,決定是否中斷操作,進行返回。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一類檢查點很多。如下面兩處:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"select每行讀取都會檢查;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nint rr_sequential(READ_RECORD *info)\n{\n int tmp;\n while ((tmp=info->table->file->ha_rnd_next(info->record)))\n {\n \/*\n ha_rnd_next can return RECORD_DELETED for MyISAM when one thread is reading and another deleting without locks.\n *\/\n if (info->thd->killed || (tmp != HA_ERR_RECORD_DELETED))\n {\n tmp= rr_handle_error(info, tmp); break;\n }\n }\n return tmp;\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d8\/d8e9d71bd662ea59d97106117f476149.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內部innodb引擎在獲取記錄時,也會去判斷thd->killed狀態,決定是否中斷操作,進行返回。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n* frame #0: 0x0000000106629a7e mysqld`trx_is_interrupted(trx=0x00007f8266dbd8a0) at ha_innodb.cc:3234:9 \nframe #1: 0x000000010686d901 mysqld`row_search_mvcc(buf=\", mode=PAGE_CUR_G, prebuilt=0x00007f826b81faa0,\nmatch_mode=0, direction=0) at row0sel.cc:5245:6\nframe #2: 0x0000000106636cda mysqld`ha_innobase::index_read(this=0x00007f825c8c1430, buf=\", key_ptr=0x0000000000000000, key_len=0, find_flag=HA_READ_AFTER_KEY) at ha_innodb.cc:8768:10\nframe #3: 0x000000010663798c mysqld`ha_innobase::index_first(this=0x00007f825c8c1430, buf=\") at ha_innodb. cc:9186:14\nframe #4: 0x0000000106637c2a mysqld`ha_innobase::rnd_next(this=0x00007f825c8c1430, buf=\") at ha_innodb.cc: 9284:11\nframe #5: 0x000000010563f7a5 mysqld`handler::ha_rnd_next(this=0x00007f825c8c1430, buf=\") at handler.cc:2963:\n3\nframe #6: 0x0000000105d954d4 mysqld`rr_sequential(info=0x00007f826780d208) at records.cc:517:34\nframe #7: 0x0000000105e85c78 mysqld`join_init_read_record(tab=0x00007f826780d1b8) at sql_executor.cc:2504:10 \nframe #8: 0x0000000105e82a1c mysqld`sub_select(join=0x00007f825e37a4d0, qep_tab=0x00007f826780d1b8,\nend_of_records=false) at sql_executor.cc:1284:14\nframe #9: 0x0000000105e7f299 mysqld`do_select(join=0x00007f825e37a4d0) at sql_executor.cc:957:12 \nframe #10: 0x0000000105e7ec26 mysqld`JOIN::exec(this=0x00007f825e37a4d0) at sql_executor.cc:206:10 \nframe #11: 0x0000000105f5fe90 mysqld`handle_query(thd=0x00007f825e35ec00, lex=0x00007f825e361058,\nresult=0x00007f825e379cf8, added_options=0, removed_options=0) at sql_select.cc:191:21 \nframe #12: 0x0000000105f006f7 mysqld`execute_sqlcom_select(thd=0x00007f825e35ec00,\nall_tables=0x00007f825e3796b0) at sql_parse.cc:5155:12\nframe #13: 0x0000000105ef6527 mysqld`mysql_execute_command(thd=0x00007f825e35ec00, first_level=true) at sql_parse.cc:2826:12\nframe #14: 0x0000000105ef3d62 mysqld`mysql_parse(thd=0x00007f825e35ec00, parser_state=0x00007000097d5340) at sql_parse.cc:5584:20\nframe #15: 0x0000000105ef0bf0 mysqld`dispatch_command(thd=0x00007f825e35ec00, com_data=0x00007000097d5e78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #16: 0x0000000105ef2e70 mysqld`do_command(thd=0x00007f825e35ec00) at sql_parse.cc:1032:17 \nframe #17: 0x0000000106081976 mysqld`::handle_connection(arg=0x00007f82681d84f0) at\nconnection_handler_per_thread.cc:313:13\nframe #18: 0x0000000106a4c74c mysqld`::pfs_spawn_thread(arg=0x00007f8268012fc0) at pfs.cc:2197:3 \nframe #19: 0x00007fff734b6109 libsystem_pthread.dylib`_pthread_start + 148\nframe #20: 0x00007fff734b1b8b libsystem_pthread.dylib`thread_start + 15"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b3\/b33f183ef0cbfe3a49a10e5601c3c7c5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)第二類:需要其他線程通過信號量喚醒"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ① connection2發送kill命令時,connection1處於innodb層wait行鎖狀態"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要通過awake中的下面這行觸發喚醒 (也可能由系統的後臺線程lock_wait_timeout_thread喚醒)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* Interrupt target waiting inside a storage engine. *\/ \n if (state_to_set != THD::NOT_KILLED)\n ha_kill_connection(this);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參 考 下 面 debug 分 析 的 case3 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ② connection2發送kill命令時,connection1處於msyql層wait狀態(由connection2喚醒)  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要通過下面的方法實現喚醒:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* Broadcast a condition to kick the target if it is waiting on it. *\/ \n if (is_killable)\n {\n mysql_mutex_lock(&LOCK_current_cond); \n if (current_cond && current_mutex)\n {\n DBUG_EXECUTE_IF(\"before_dump_thread_acquires_current_mutex\",\n {\n const char act[]=\n \"now signal dump_thread_signal wait_for go_dump_thread\"; \n DBUG_ASSERT(!debug_sync_set_action(current_thd,\n STRING_WITH_LEN(act)));\n };);\n mysql_mutex_lock(current_mutex); \n mysql_cond_broadcast(current_cond); mysql_mutex_unlock(current_mutex);\n }\n mysql_mutex_unlock(&LOCK_current_cond);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考下面debug分析的case4。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、原因總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過上面代碼分析可以得知,kill之後會進行回滾操作(大事務)或清理臨時表(比如較慢的ddl),都有可能導致長時間處於killed狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體回滾是上面提到的handle_connection中的thd->release_resources()中執行clean_up進行回滾或者在sql_parse中trans_rollback_stmt中。  "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e1\/e10c83348f07ddfb5f0bece0769ea183.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除開回滾操作的影響,如果本身mysql機器負載較高,一樣會導致主動檢查thd->killed會有延遲或者影響線程的喚醒調度。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、案例復現"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、本地debug復現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)case 1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection1已執行完命令,connection2去kill連接1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(較爲簡單,略)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)case2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection1正在執行(如parse階段,還沒有真正到innodb層),connection2去kill連接1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(較爲簡單,略)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3)case3(innodb層喚醒)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新起一個session作爲connection0開啓事務,update某一行row1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再起一個session作爲connection1 update同一行row1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n* thread #29, stop reason = breakpoint 127.1\n* frame #0: 0x00000001060f87cb mysqld`os_event::wait(this=0x00007fcce2b16598) at os0event.cc:180:4 \nframe #1: 0x00000001060f8773 mysqld`os_event::wait_low(this=0x00007fcce2b16598, reset_sig_count=1) at\nos0event.cc:366:3\nframe #2: 0x00000001060f91ed mysqld`os_event_wait_low(event=0x00007fcce2b16598, reset_sig_count=0) at os0event.cc:611:9\nframe #3: 0x00000001060a9332 mysqld`lock_wait_suspend_thread(thr=0x00007fccd90936f0) at lock0wait.cc:315:2 \nframe #4: 0x00000001061b4f77 mysqld`row_mysql_handle_errors(new_err=0x0000700009f3af6c,\ntrx=0x00007fcce1cc0ce0, thr=0x00007fccd90936f0, savept=0x0000000000000000) at row0mysql.cc:783:3\nframe #5: 0x0000000106212fa6 mysqld`row_search_mvcc(buf=\", mode=PAGE_CUR_GE, prebuilt=0x00007fccd9092ea0, match_mode=1, direction=0) at row0sel.cc:6292:6\nframe #6: 0x0000000105fd8cda mysqld`ha_innobase::index_read(this=0x00007fccd9091430, buf=\", key_ptr=\"\\x1e\", key_len=514, find_flag=HA_READ_KEY_EXACT) at ha_innodb.cc:8768:10\nframe #7: 0x0000000104ff2c67 mysqld`handler::index_read_map(this=0x00007fccd9091430, buf=\", key=\"\\x1e\", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at handler.h:2824:13\nframe #8: 0x0000000104fe1f14 mysqld`handler::ha_index_read_map(this=0x00007fccd9091430, buf=\", key=\"\\x1e\", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at handler.cc:3047:3\nframe #9: 0x0000000104feeb62 mysqld`handler::read_range_first(this=0x00007fccd9091430, start_key=0x00007fccd9091518, end_key=0x00007fccd9091538, eq_range_arg=true, sorted=true) at handler.cc:7412:13\nframe #10: 0x0000000104fec01b mysqld`handler::multi_range_read_next(this=0x00007fccd9091430, range_info=0x0000700009f3bc10) at handler.cc:6477:15\nframe #11: 0x0000000104fed4ba mysqld`DsMrr_impl::dsmrr_next(this=0x00007fccd9091698, range_info=0x0000700009f3bc10) at handler.cc:6869:24\nframe #12: 0x0000000105fee7f6 mysqld`ha_innobase::multi_range_read_next(this=0x00007fccd9091430, range_info=0x0000700009f3bc10) at ha_innodb.cc:20585:18\nframe #13: 0x00000001056ec7c8 mysqld`QUICK_RANGE_SELECT::get_next(this=0x00007fcce2b15fa0) at opt_range.cc: 11247:21\nframe #14: 0x00000001057371ad mysqld`rr_quick(info=0x0000700009f3c320) at records.cc:405:29\nframe #15: 0x0000000105984e35 mysqld`mysql_update(thd=0x00007fcce585f400, fields=0x00007fcce5863618, values=0x00007fcce58643d8, limit=18446744073709551615, handle_duplicates=DUP_ERROR, found_return=0x0000700009f3cb48, updated_return=0x0000700009f3cb40) at sql_update.cc:819:14\nframe #16: 0x000000010598cc87 mysqld`Sql_cmd_update::try_single_table_update(this=0x00007fcce58643c8, thd=0x00007fcce585f400, switch_to_multitable=0x0000700009f3cc07) at sql_update.cc:2927:21\nframe #17: 0x000000010598d457 mysqld`Sql_cmd_update::execute(this=0x00007fcce58643c8, thd=0x00007fcce585f400) at sql_update.cc:3058:7\nframe #18: 0x000000010589b475 mysqld`mysql_execute_command(thd=0x00007fcce585f400, first_level=true) at sql_parse.cc:3616:26\nframe #19: 0x0000000105895d62 mysqld`mysql_parse(thd=0x00007fcce585f400, parser_state=0x0000700009f40340) at sql_parse.cc:5584:20\nframe #20: 0x0000000105892bf0 mysqld`dispatch_command(thd=0x00007fcce585f400, com_data=0x0000700009f40e78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #21: 0x0000000105894e70 mysqld`do_command(thd=0x00007fcce585f400) at sql_parse.cc:1032:17 \nframe #22: 0x0000000105a23976 mysqld`::handle_connection(arg=0x00007fcce2555370) at\nconnection_handler_per_thread.cc:313:13\nframe #23: 0x00000001063ee74c mysqld`::pfs_spawn_thread(arg=0x00007fcce2555f80) at pfs.cc:2197:3 \nframe #24: 0x00007fff71032109 libsystem_pthread.dylib`_pthread_start + 148\nframe #25: 0x00007fff7102db8b libsystem_pthread.dylib`thread_start + 15 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到connection1 在等待row1的行鎖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7c\/7c78607cf906d2cfca7dfaa35736b0f1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓新的session作爲connection2執行kill connection1的命令,走到前面代碼中分析到的ha_close_connection時,會去中斷innodb行鎖的等待。堆棧爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nthread #30\nframe #0: 0x00000001060f910e mysqld`os_event::set(this=0x00007fcce2b16598) at os0event.cc:93:2 \nframe #1: 0x00000001060f90b5 mysqld`os_event_set(event=0x00007fcce2b16598) at os0event.cc:560:9\nframe #2: 0x00000001060aa06e mysqld`lock_wait_release_thread_if_suspended(thr=0x00007fccd90936f0) at lock0wait.cc:411:3\nframe #3: 0x0000000106089a99 mysqld`lock_cancel_waiting_and_release(lock=0x00007fccd8866c18) at lock0lock. cc:6789:3\nframe #4: 0x0000000106096679 mysqld`lock_trx_handle_wait(trx=0x00007fcce1cc0ce0) at lock0lock.cc:6972:3 \nframe #5: 0x0000000105ff93a6 mysqld`innobase_kill_connection(hton=0x00007fccd7e094d0,\nthd=0x00007fcce585f400) at ha_innodb.cc:4868:3\nframe #6: 0x0000000104fdcb96 mysqld`kill_handlerton(thd=0x00007fcce585f400, plugin=0x0000700009f809d8, (null)=0x0000000000000000) at handler.cc:1052:7\nframe #7: 0x00000001058d659c mysqld`plugin_foreach_with_mask(thd=0x00007fcce585f400, funcs=0x0000700009f80a60, type=1, state_mask=4294967287, arg=0x0000000000000000)(THD*, st_plugin_int**, void*), int, unsigned int, void*) at sql_plugin.cc:2524:21\nframe #8: 0x00000001058d66a2 mysqld`plugin_foreach_with_mask(thd=0x00007fcce585f400, func= (mysqld`kill_handlerton(THD*, st_plugin_int**, void*) at handler.cc:1046), type=1, state_mask=8, arg=0x0000000000000000)(THD*, st_plugin_int**, void*), int, unsigned int, void*) at sql_plugin.cc:2539:10\nframe #9: 0x0000000104fdcb1b mysqld`ha_kill_connection(thd=0x00007fcce585f400) at handler.cc:1060:3\nframe #10: 0x00000001057f8923 mysqld`THD::awake(this=0x00007fcce585f400, state_to_set=KILL_CONNECTION) at sql_class.cc:2077:5\nframe #11: 0x00000001058a961f mysqld`kill_one_thread(thd=0x00007fccd8b9ea00, id=3, only_kill_query=false) at sql_parse.cc:6479:14\nframe #12: 0x0000000105896529 mysqld`sql_kill(thd=0x00007fccd8b9ea00, id=3, only_kill_query=false) at sql_parse.cc:6507:16\nframe #13: 0x000000010589e0fa mysqld`mysql_execute_command(thd=0x00007fccd8b9ea00, first_level=true) at sql_parse.cc:4210:5\nframe #14: 0x0000000105895d62 mysqld`mysql_parse(thd=0x00007fccd8b9ea00, parser_state=0x0000700009f84340) at sql_parse.cc:5584:20\nframe #15: 0x0000000105892bf0 mysqld`dispatch_command(thd=0x00007fccd8b9ea00, com_data=0x0000700009f84e78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #16: 0x0000000105894e70 mysqld`do_command(thd=0x00007fccd8b9ea00) at sql_parse.cc:1032:17 \nframe #17: 0x0000000105a23976 mysqld`::handle_connection(arg=0x00007fcce5307d10) at\nconnection_handler_per_thread.cc:313:13\nframe #18: 0x00000001063ee74c mysqld`::pfs_spawn_thread(arg=0x00007fcce2405030) at pfs.cc:2197:3 \nframe #19: 0x00007fff71032109 libsystem_pthread.dylib`_pthread_start + 148\nframe #20: 0x00007fff7102db8b libsystem_pthread.dylib`thread_start + 15 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8a\/8a44bed7903587eef27dbbaecad9c22f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注 :mysql啓動時也會啓動一個線程檢測鎖是否超時(間隔1s),也會去調用lock_cancel_waiting_and_release中斷等待行鎖的線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏的超時機制也可以防止信號量丟失無法喚醒的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4)case4 (mysql層喚醒)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓connection0,lock一張表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後開啓connection1, 去update這張表,可以看到線程會阻塞在wait表鎖狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nthread #29\nframe #0: 0x00007fff733f5882 libsystem_kernel.dylib` psynch_cvwait + 10\nframe #1: 0x00007fff734b6425 libsystem_pthread.dylib`_pthread_cond_wait + 698 \nframe #2: 0x000000010ef6d495 mysqld`native_cond_timedwait(cond=0x00007f96cb1a6538,\nmutex=0x00007f96cb1a64d8, abstime=0x000070000bc67e70) at thr_cond.h:136:10\nframe #3: 0x000000010ef6d37a mysqld`safe_cond_timedwait(cond=0x00007f96cb1a6538, mp=0x00007f96cb1a6498, abstime=0x000070000bc67e70, file=\"\/Users\/zhaoxiaojie\/CLionProjects\/mysql-5.7.33\/sql\/mdl.cc\", line=1868) at thr_cond.c:100:10\nframe #4: 0x000000010ea69955 mysqld`my_cond_timedwait(cond=0x00007f96cb1a6538, mp=0x00007f96cb1a6498, abstime=0x000070000bc67e70, file=\"\/Users\/zhaoxiaojie\/CLionProjects\/mysql-5.7.33\/sql\/mdl.cc\", line=1868) at thr_cond.h:187:10\nframe #5: 0x000000010ea63341 mysqld`inline_mysql_cond_timedwait(that=0x00007f96cb1a6538, mutex=0x00007f96cb1a6498, abstime=0x000070000bc67e70, src_file=\"\/Users\/zhaoxiaojie\/CLionProjects\/mysql-5.7.33\n\/sql\/mdl.cc\", src_line=1868) at mysql_thread.h:1232:13\nframe #6: 0x000000010ea631f8 mysqld`MDL_wait::timed_wait(this=0x00007f96cb1a6498, owner=0x00007f96cb1a6400, abs_timeout=0x000070000bc67e70, set_status_on_timeout=true, wait_state_name=0x00000001100b4b58) at mdl.cc:1867: 18\nframe #7: 0x000000010ea660e5 mysqld`MDL_context::acquire_lock(this=0x00007f96cb1a6498, mdl_request=0x00007f96cb0e71a0, lock_wait_timeout=31536000) at mdl.cc:3699:25\nframe #8: 0x000000010eb8a090 mysqld`open_table_get_mdl_lock(thd=0x00007f96cb1a6400, ot_ctx=0x000070000bc689a0, table_list=0x00007f96cb0e6e00, flags=0, mdl_ticket=0x000070000bc68318) at sql_base. cc:2914:35\nframe #9: 0x000000010eb885ec mysqld`open_table(thd=0x00007f96cb1a6400, table_list=0x00007f96cb0e6e00, ot_ctx=0x000070000bc689a0) at sql_base.cc:3296:9\nframe #10: 0x000000010eb8e77b mysqld`open_and_process_table(thd=0x00007f96cb1a6400, lex=0x00007f96cb1a8858, tables=0x00007f96cb0e6e00, counter=0x00007f96cb1a8918, flags=0, prelocking_strategy=0x000070000bc68ab8, has_prelocking_list=false, ot_ctx=0x000070000bc689a0) at sql_base.cc:5260:14\nframe #11: 0x000000010eb8d8dc mysqld`open_tables(thd=0x00007f96cb1a6400, start=0x000070000bc68ac8, counter=0x00007f96cb1a8918, flags=0, prelocking_strategy=0x000070000bc68ab8) at sql_base.cc:5883:14\nframe #12: 0x000000010eb90c6d mysqld`open_tables_for_query(thd=0x00007f96cb1a6400, tables=0x00007f96cb0e6e00, flags=0) at sql_base.cc:6660:7\nframe #13: 0x000000010ed48b0c mysqld`Sql_cmd_update::try_single_table_update(this=0x00007f96cb0e6dc8, thd=0x00007f96cb1a6400, switch_to_multitable=0x000070000bc68c07) at sql_update.cc:2911:7\nframe #14: 0x000000010ed49457 mysqld`Sql_cmd_update::execute(this=0x00007f96cb0e6dc8, thd=0x00007f96cb1a6400) at sql_update.cc:3058:7\nframe #15: 0x000000010ec57475 mysqld`mysql_execute_command(thd=0x00007f96cb1a6400, first_level=true) at sql_parse.cc:3616:26\nframe #16: 0x000000010ec51d62 mysqld`mysql_parse(thd=0x00007f96cb1a6400, parser_state=0x000070000bc6c340) at sql_parse.cc:5584:20\nframe #17: 0x000000010ec4ebf0 mysqld`dispatch_command(thd=0x00007f96cb1a6400, com_data=0x000070000bc6ce78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #18: 0x000000010ec50e70 mysqld`do_command(thd=0x00007f96cb1a6400) at sql_parse.cc:1032:17 \nframe #19: 0x000000010eddf976 mysqld`::handle_connection(arg=0x00007f96cfe2e1b0) at\nconnection_handler_per_thread.cc:313:13\nframe #20: 0x000000010f7aa74c mysqld`::pfs_spawn_thread(arg=0x00007f96d5c1b300) at pfs.cc:2197:3 \nframe #21: 0x00007fff734b6109 libsystem_pthread.dylib`_pthread_start + 148\nframe #22: 0x00007fff734b1b8b libsystem_pthread.dylib`thread_start + 15"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓connection2,去kill connection1,執行到awake時,可以看到要喚醒的條件變量和 connection1在等待的是同一個對象。(0x00007f96cb1a6538)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2a\/2ae4d51127a5c45a58601251be99afbe.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/35\/35fe4fd3d996178c3689a55ab2b9abc0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection2執行完broadcast之後,connection1線程即被喚醒"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f5\/f52a10017890163734046b23ad573243.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之後一層層返回錯誤,在sql_parse中會執行trans_rollback_stmt執行回滾操作。最後在handle_connection中檢查到connection需要被關閉。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、實際案例1"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前在工作中實際碰到過一個執行kill命令後,連接數反而持續上漲的案例,現象簡單概括就是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異常流量導致數據庫實例連接數過多後,dba執行kill連接,連接數會短暫下降,之後連接數會繼續上漲,繼續kill之後,連接數還是會上漲。並且大量線程一直處於killed狀態,看起來像是無法kill連接,只能重啓解決。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)問題 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連接數爲什麼反而會上漲?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面KILL工作流程1.(2)部分中提到,kill線程會在喚醒被kill線程之前先關閉連接,客戶端就是這個時候報lost connection的錯誤。這樣大量客戶端在線程還未被kill結束時已經開始了重連(sdk連接池),這樣造成了問題加劇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外MySQL中innodb_thread_concurrency會限制進入innodb的線程併發度。那麼當進入innodb層的線程達到閾值後,後面重建的大量連接會在mysql層執行for循環判斷是否可以進入innodb。但是這個過程是沒有檢查killed狀態的,導致這些線程一直無法被kill(儘管show processlist顯示爲Killed)。除非innodb裏一個線程退出,使得某個線程可以進入innodb,從而執行代碼到主動檢查處或被喚醒執行kill邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"srv0conc.cc srv_conc_enter_innodb_with_atomics"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/73faeed59ed5c8053657b666995b9200.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:實際mysql會有相關參數控制進入innodb時的最大等待時間,爲簡化描述問題,暫不展開。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)復現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將mysql  innodb_thread_concurrency設置爲1(默認爲0代表不限制)。connection1執行select進入innodb後暫停住線程。     "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection2也執行select,那麼會發現connection2會卡在檢查是否可以進入innodb。具體堆棧爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n* frame #0: 0x000000010c9772bd mysqld`srv_conc_enter_innodb(prebuilt=0x00007f7fb1065ca0) at srv0conc.cc:265:37 \nframe #1: 0x000000010c6d9498 mysqld`innobase_srv_conc_enter_innodb(prebuilt=0x00007f7fb1065ca0) at ha_innodb.cc: 1534:4\nframe #2: 0x000000010c6dcc19 mysqld`ha_innobase::index_read(this=0x00007f7fb10a3630, buf=\", key_ptr=0x0000000000000000, key_len=0, find_flag=HA_READ_AFTER_KEY) at ha_innodb.cc:8753:3\nframe #3: 0x000000010c6dd98c mysqld`ha_innobase::index_first(this=0x00007f7fb10a3630, buf=\") at ha_innodb.cc: 9186:14\nframe #4: 0x000000010c6ddc2a mysqld`ha_innobase::rnd_next(this=0x00007f7fb10a3630, buf=\") at ha_innodb.cc:9284: 11\nframe #5: 0x000000010b6e57a5 mysqld`handler::ha_rnd_next(this=0x00007f7fb10a3630, buf=\") at handler.cc:2963:3 \nframe #6: 0x000000010be3b4d4 mysqld`rr_sequential(info=0x00007f7fb0c1dd90) at records.cc:517:34\nframe #7: 0x000000010bf2bc78 mysqld`join_init_read_record(tab=0x00007f7fb0c1dd40) at sql_executor.cc:2504:10 \nframe #8: 0x000000010bf28a1c mysqld`sub_select(join=0x00007f7fb206fc48, qep_tab=0x00007f7fb0c1dd40, end_of_records=false) at sql_executor.cc:1284:14\nframe #9: 0x000000010bf25299 mysqld`do_select(join=0x00007f7fb206fc48) at sql_executor.cc:957:12 \nframe #10: 0x000000010bf24c26 mysqld`JOIN::exec(this=0x00007f7fb206fc48) at sql_executor.cc:206:10 frame #11: 0x000000010c005e90 mysqld`handle_query(thd=0x00007f7fb2065a00, lex=0x00007f7fb2067e58, result=0x00007f7fb206f2f8, added_options=0, removed_options=0) at sql_select.cc:191:21\nframe #12: 0x000000010bfa66f7 mysqld`execute_sqlcom_select(thd=0x00007f7fb2065a00, all_tables=0x00007f7fb206ecb0) at sql_parse.cc:5155:12\nframe #13: 0x000000010bf9c527 mysqld`mysql_execute_command(thd=0x00007f7fb2065a00, first_level=true) at sql_parse.cc:2826:12\nframe #14: 0x000000010bf99d62 mysqld`mysql_parse(thd=0x00007f7fb2065a00, parser_state=0x0000700007205340) at sql_parse.cc:5584:20\nframe #15: 0x000000010bf96bf0 mysqld`dispatch_command(thd=0x00007f7fb2065a00, com_data=0x0000700007205e78, command=COM_QUERY) at sql_parse.cc:1491:5\nframe #16: 0x000000010bf98e70 mysqld`do_command(thd=0x00007f7fb2065a00) at sql_parse.cc:1032:17 \nframe #17: 0x000000010c127976 mysqld`::handle_connection(arg=0x00007f7fbcb5e650) at connection_handler_per_thread.cc:313:13\nframe #18: 0x000000010caf274c mysqld`::pfs_spawn_thread(arg=0x00007f7fbcb6a5e0) at pfs.cc:2197:3 \nframe #19: 0x00007fff71032109 libsystem_pthread.dylib`_pthread_start + 148\nframe #20: 0x00007fff7102db8b libsystem_pthread.dylib`thread_start + 15"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/220b3e347a6a481696c13759c2012867.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再開啓一個connection3, 去kill connection2,可以發現connection2會一直處於killed狀態(客戶端會斷開連接)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使connection3完成了前面提到的ha_close_connection和broadcast信號量,connection2的堆棧還是一直在上面for循環中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/02\/024f70fc8f4f09956922b3fabe33e328.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"直到connection1退出innodb 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4f\/4fd72ae8afcd1c8eadb109aacc925cdc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/19\/192201fbebded7fc5394ae1bb3f93fae.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connection2進入innodb之後通過主動檢查的方式執行kill邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0f\/0f315d48cd96019356e2ff47c144f98a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/2c\/9f\/2c4890e73d43b61fb6ba263e71807c9f.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時show processlist顯示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a4\/a8\/a4e16c21628a8db750f5bbffeb88fca8.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/07\/fd\/079bc19cee036112ba01878322be56fd.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"猿輔導數據庫平臺團隊,"},{"type":"text","text":"承載猿輔導在線教育全公司的數據庫產品研發、運維及服務需求。團隊始終致力於新技術的探索實踐,結合業務場景不斷打磨並提升高可用、可擴展、高可靠的基礎設施能力,作爲核心基礎設施建設者,支持業務快速發展。(猿輔導技術公衆號ID:gh_cb5c83bb3ee0)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"趙曉傑,"},{"type":"text","text":"猿輔導數據庫平臺團隊成員,主要從事數據庫存儲、中間件等方向研發工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/owXeNZBYl2ee82W6RZQpzQ","title":"xxx","type":null},"content":[{"type":"text","text":"MySQL進階墊腳石:線程長時間處於killed狀態怎麼破?"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章