記一次 RR 與 RC 死鎖問題排查

記錄一下開發過程中遇到的死鎖問題和解決方案

首先是隔離級別,mysql有四個隔離級別,從寬鬆到嚴格依次是

  • 讀未提交       Read Uncommitted
  • 讀提交           Read Committed   (線上環境使用的隔離級別)
  • 可重複讀       Repeatable Read  (Mysql 默認的隔離級別)
  • 串行化           Serializable

 

第一次死鎖

背景: 在db中無記錄時,併發插入會產生死鎖

首先是第一次遇到的死鎖問題,  通過命令  

show engine innodb status

查看死鎖日誌如下:

 

=====================================
2021-11-15 14:45:15 0x70000908f000 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 53 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 436 srv_active, 0 srv_shutdown, 136216 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 14531
OS WAIT ARRAY INFO: signal count 12688
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-11-15 14:37:11 0x700004455000
*** (1) TRANSACTION:
TRANSACTION 47513, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MySQL thread id 26344, OS thread handle 123145497554944, query id 441640 localhost 127.0.0.1 root update
INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'C-調撥',1,0,0,1,'2021-11-15 14:37:11.698119','2021-11-15 14:37:11.698119')

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47513 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (2) TRANSACTION:
TRANSACTION 47503, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 24 lock struct(s), heap size 3488, 21 row lock(s), undo log entries 4
MySQL thread id 26392, OS thread handle 123145430462464, query id 441871 localhost 127.0.0.1 root update
INSERT IGNORE INTO `wms_product` (`app_id`,`wms_warehouse_id`,`sku_id`,`amount`,`period_amount`,`trans_amount`,`version`,`update_time`,`create_time`) VALUES (2,1,'B併發入庫',1,0,0,1,'2021-11-15 14:37:11.682939','2021-11-15 14:37:11.682939')

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 179 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 47503 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000056; asc        V;;

*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 47828
Purge done for trx's n:o < 47828 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479673057184, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673056392, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673055600, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479673054808, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 2
1644 OS file reads, 161796 OS file writes, 113308 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 5.41 writes/s, 2.28 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 5 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number          56679186
Log buffer assigned up to    56679186
Log buffer completed up to   56679186
Log written up to            56679186
Log flushed up to            56679186
Added dirty pages up to      56679186
Pages flushed up to          56679186
Last checkpoint at           56679186
86480 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 504140
Buffer pool size   8191
Free buffers       5147
Database pages     3028
Old database pages 1097
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 668, not young 149
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1086, created 1943, written 58798
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 3028, unzip_LRU len: 0
I/O sum[229]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=555, Main thread ID=0x7000048f0000 , state=sleeping
Number of rows inserted 21319, updated 39868, deleted 3791, read 180746
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 837, updated 3436, deleted 524, read 39358
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

 

 

可以看到是針對僅一張表`wms`.`wms_product`進行插入操作時產生的死鎖。

 

代碼中關於 表`wms`.`wms_product`的操作示意是

  1. Select * from wms_product for update    –- 嘗試加鎖
  2. Insert xxx  (在未查詢到的情況下插入一條記錄)

 

原因:  此時是在本地環境,隔離級別是默認的RR  (特別提醒,直接在mysql通過命令修改隔離級別不會生效/僅對當前會話生效, 需要修改配置文件後重啓)

在RR級別下 mysql爲解決 幻讀 的問題,引入了間隙鎖 gap lock.  間隙鎖鎖住的不是一行記錄,而是行與行之間的間隙 (所謂幻讀就是指在一次事務中, 前後兩次查詢同一個範圍的時候,後一次查詢看到了前一次查詢沒有看到的記錄.  成因是S或X鎖只能鎖住已存在的行,不能阻止別人插入)

間隙鎖與間隙鎖之間不互斥,間隙鎖僅與插入操作互斥

理解了上面的間隙鎖就很容易理解死鎖產生的原因了,簡單的畫了下時序圖:

時刻
Session1
Session2
備註
T1 獲取Gap lock 獲取Gap lock 同時獲取間隙鎖,不互斥
T2 Insert xxx (等待Session2的 間隙鎖) Insert xxx (等待Session1的 間隙鎖) 互相等待
T3     死鎖

 

備註:  這裏經過和雷神大佬的討論,單純針對這種情況而不考慮業務下第一次的for update應該可以不加,因爲後面的 Insert 或 Update 都會有X鎖

 

 

第二次死鎖

背景: 在db中無記錄時,併發插入會產生死鎖

通過修改配置文件並重啓成功更新了本地的隔離級別爲RC, 不再有間隙鎖的問題。 但業務方面也有一些變化,引入了 product_batchno 批次表,操作時需要對兩張表同時進行更新.

死鎖日誌如下:

 

=====================================
2021-11-30 10:24:59 0x70000e150000 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 42 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 45 srv_active, 0 srv_shutdown, 129307 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 592
OS WAIT ARRAY INFO: signal count 478
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2021-11-30 10:24:32 0x70000cdf6000
*** (1) TRANSACTION:
TRANSACTION 59401, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2
MySQL thread id 328, OS thread handle 123145563062272, query id 8008 localhost 127.0.0.1 root update
INSERT INTO `wms_product_batchno` (`app_id`,`wms_warehouse_id`,`sku_id`,`batch_no`,`amount`,`period_amount`,`expire_time`,`update_time`,`create_time`) VALUES (2,1,'C-調撥','C11234124124129999999',1,0,'2021-12-10 10:24:32','2021-11-30 10:24:32.471672','2021-11-30 10:24:32.471672')

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59401 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000001; asc         ;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59401 lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;;
 4: len 8; hex 8000000000000001; asc         ;;


*** (2) TRANSACTION:
TRANSACTION 59467, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s)
MySQL thread id 390, OS thread handle 123145633349632, query id 8233 localhost 127.0.0.1 root statistics
SELECT * FROM `wms_product` WHERE app_id = 2 and sku_id = 'C-調撥' and wms_warehouse_id = 1 FOR UPDATE

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 305 page no 5 n bits 72 index uk_skuid_wmsid_app_id_batchno of table `wms`.`wms_product_batchno` trx id 59467 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 21; hex 433131323334313234313234313239393939393939; asc C11234124124129999999;;
 4: len 8; hex 8000000000000001; asc         ;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 307 page no 5 n bits 72 index uk_skuid_wmsid_appid of table `wms`.`wms_product` trx id 59467 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 8; hex 432de8b083e68ba8; asc C-      ;;
 1: len 8; hex 8000000000000001; asc         ;;
 2: len 4; hex 80000002; asc     ;;
 3: len 8; hex 8000000000000001; asc         ;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 59502
Purge done for trx's n:o < 59502 undo n:o < 0 state: running but idle
History list length 15
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479685464760, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685463968, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685463176, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685462384, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 281479685461592, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
2294 OS file reads, 6630 OS file writes, 4541 OS fsyncs
0.02 reads/s, 16384 avg bytes/read, 12.31 writes/s, 6.83 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 4 buffer(s)
4.60 hash searches/s, 31.50 non-hash searches/s
---
LOG
---
Log sequence number          66935527
Log buffer assigned up to    66935527
Log buffer completed up to   66935527
Log written up to            66935527
Log flushed up to            66935527
Added dirty pages up to      66935527
Pages flushed up to          66935527
Last checkpoint at           66935527
2437 log i/o's done, 3.10 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 0
Dictionary memory allocated 451775
Buffer pool size   8192
Free buffers       6001
Database pages     2179
Old database pages 785
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1728, created 451, written 3091
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 2179, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=581, Main thread ID=0x70000d291000 , state=sleeping
Number of rows inserted 267, updated 445, deleted 0, read 1668
0.17 inserts/s, 0.69 updates/s, 0.00 deletes/s, 4.90 reads/s
Number of system rows inserted 152, updated 1836, deleted 76, read 14284
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 7.69 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

 

 

 

代碼中針對兩張表的操作順序如下

  1. Select * from productBatchNo  for update            // 同樣嘗試加鎖
  2. Select * from product for update                           //同樣嘗試加鎖
  3. 更新 productBatchNo 和 product
  4. Commit

 

具體情況如下:

注意下表中的操作都發生在t1時刻

操作
Session1
Session2
Session3
備註
查詢 ProdcutBatchNo 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 已完成, 但由於db中無記錄 沒有加上鎖 未開始 針對Session2, 由於Session1沒有提交,Session2在RC級別下看不到數據,因爲沒有加鎖
查詢 Product 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 準備執行    
更新邏輯(BatchNo 和 Product) 已完成      
Commit 事務提交 未提交      

 

當進入t2時刻後:

操作
Session1
Session2
Session3
備註
操作
Session1
Session2
Session3
備註
查詢 ProdcutBatchNo 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 已完成, 但由於db中無記錄 沒有加上鎖 db中有記錄,針對Product 加上 X鎖  
查詢 Product 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 db 中有記錄,針對product 加上X 鎖 準備執行  
更新邏輯(BatchNo 和 Product) 已完成 準備執行    
Commit 事務提交 已完成 成功提交      

 

當進入t3時刻後

操作
Session1
Session2
Session3
備註
查詢 ProdcutBatchNo 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 已完成, 但由於db中無記錄 沒有加上鎖 db中有記錄,針對Product 加上 X鎖  
查詢 Product 並加鎖  已完成, 但由於db中無記錄,沒有加上鎖 db 中有記錄,針對product 加上X 鎖 由於Prodct 被Session2 加了X鎖 等待釋放  
更新邏輯(BatchNo 和 Product) 已完成 準備執行, 由於BatchNo上被Session3 加了X鎖,等待釋放    
Commit 事務提交 已完成 成功提交     Session2 和 Session3 互相等待,死鎖

根源還是在不存在的記錄無法鎖上.

解決方案:

在查詢 ProductBatchNo 或 Product 之後判斷DB中是否有記錄,如果DB中沒有記錄就直接結束這次的事務。並向DB中插入對應的空記錄。(關鍵信息如Sku_ID, BatchNo等有值,商品數量爲0)

通過這種方式消除了不存在的行的影響,這樣後續更新操作中,一旦某個Session拿到了鎖,就會直接阻塞別的 Session.

 

參考資料:

https://www.cnblogs.com/paul8339/p/6877729.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章