記一次Mysql死鎖問題排查流程
2019.10.05 15:20:16字數 543閱讀 7
隔壁同事大佬
第一次在實際生產環境遇到死鎖問題,從開始的懵逼狀到找到並解決該問題,經歷了無數次的百度。。。
- 項目用的MySQL數據庫引擎是InnoDB,數據庫的行鎖、表鎖是通過InnoDB使用表的索引來實現的。那麼就先查詢一下InnoDB的狀態:
show engine innodb status;
只截取有用信息:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-08-21 12:08:13 7fb595c29700
*** (1) TRANSACTION:
TRANSACTION 63912726, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 1
MySQL thread id 1587325, OS thread handle 0x7fb55ee77700, query id 87233990 210.22.120.186 root updating
UPDATE t_app_info SET ssn='520224727',
create_time='2011-08-20 17:54:56.0',
update_time='2011-08-21 12:08:14.468',
user_id=8,
job_status='DONE',
rerun=0,
merchant_id='asd',
loan_id=502,
`name`='autotest_0815100043938',
phone_number='1234567890',
`source`=0 WHERE id=1163751259396098
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1673 page no 63 n bits 552 index `t_app_info_job_status_index` of table `test`.`t_app_info` trx id 63912726 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 167 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 50414e; asc PAN;;
1: len 8; hex 901e52e11885c002; asc R ;;
*** (2) TRANSACTION:
TRANSACTION 63912725, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 1587327, OS thread handle 0x7fb595c29700, query id 87233988 210.22.120.186 root Creating sort index
select * from t_app_info where job_status = 'PAN' and source in (0,1) order by create_time asc limit 1 for update
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1673 page no 63 n bits 552 index `t_app_info_job_status_index` of table `test`.`t_app_info` trx id 63912725 lock_mode X
Record lock, heap no 167 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 50414e; asc PAN;;
1: len 8; hex 901e52e11885c002; asc R ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1673 page no 70 n bits 176 index `PRIMARY` of table `test`.`t_app_info` trx id 63912725 lock_mode X locks rec but not gap waiting
Record lock, heap no 28 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
0: len 8; hex 901e52e11885c002; asc R ;;
1: len 6; hex 000003ceb3a3; asc ;;
2: len 7; hex 330000801d021e; asc 3 ;;
3: len 18; hex 363533313237313938363037333034383637; asc 653127198607304867;;
4: len 5; hex 99a3dc9e76; asc v;;
5: len 8; hex 80000000000006f2; asc ;;
6: len 5; hex 99a3dc9e76; asc v;;
7: len 3; hex 50414e; asc PAN;;
8: len 1; hex 80; asc ;;
9: SQL NULL;
10: len 20; hex 416d62696c447569742d47504d65726368616e74; asc AmbilDuit-GPMerchant;;
11: len 8; hex 8000000000001426; asc &;;
12: len 13; hex 4275646920536574696177616e; asc Budi Setiawan;;
13: len 13; hex 30383133323035313037323233; asc 0813205107223;;
14: SQL NULL;
15: SQL NULL;
16: len 4; hex 80000002; asc ;;
*** WE ROLL BACK TRANSACTION (2)
- 分析死鎖日誌信息:
在日誌中可以找到最近一次死鎖的日誌信息:
------------------------
LATEST DETECTED DEADLOCK
------------------------
然後在該信息裏面可以看到有這兩個標題,標題下的信息就是死鎖產生的原因
HOLDS THE LOCK(S): 當前事務持有的鎖信息
WAITING FOR THIS LOCK TO BE GRANTED: 當前事務嘗試獲取的鎖信息
該表有兩個索引,一個是主鍵id自帶的索引,還有一個是job_status字段上添加的t_app_info_job_status_index(普通索引)。
- 根據日誌信息可以看到第一條sql是根據主鍵id更新該條記錄,並且需要更新字段job_status。該sql已經獲取到主鍵索引的鎖,嘗試獲取job_status字段索引的鎖。
- 第二條sql是根據job_status字段查詢符合條件的第一條語句並且使用for update配合job_status字段的索引實現行鎖。該sql已經獲取到t_app_info_job_status_index索引,嘗試獲取到主鍵索引的鎖。
由此可以分析出死鎖產生的原因。
然後百度了很多大佬的文章看了下,InnoDB在獲取到非主鍵索引的鎖後會繼續再獲取到主鍵索引的鎖,這也解釋了爲什麼第二條sql會有這樣的行爲。
- 解決這個問題:
根據產生的原因,使用for update來操作行鎖時,最簡單的辦法是直接使用主鍵索引。但是因爲功能的需求是獲取明細狀態job_status爲PAN的第一條,因此sql修改爲如下來滿足只根據主鍵索引來進行行鎖:
select * from t_app_info where id = (select id from t_app_info where job_status = 'PAN' and source in (0,1) order by create_time asc limit 1) and job_status = 'PAN' for update;
以上就是我這次遇到死鎖的排查過程和思考過程,如有遺漏或錯誤的地方,歡迎指正出來。謝謝!