利用Python多進程並行執行加快MySQL批量UPDATE執行速度

現在有個表t

mysql> desc t;
+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| owner          | varchar(30)  | YES  |     | NULL    |                |
| object_name    | varchar(128) | YES  |     | NULL    |                |
| subobject_name | varchar(30)  | YES  |     | NULL    |                |
| object_id      | int          | NO   | PRI | NULL    | auto_increment |
| data_object_id | int          | YES  |     | NULL    |                |
| object_type    | varchar(19)  | YES  |     | NULL    |                |
| created        | datetime     | YES  |     | NULL    |                |
| last_ddl_time  | datetime     | YES  |     | NULL    |                |
| timestamp      | varchar(19)  | YES  |     | NULL    |                |
| status         | varchar(7)   | YES  |     | NULL    |                |
| temporary      | varchar(1)   | YES  |     | NULL    |                |
| generate       | varchar(1)   | YES  |     | NULL    |                |
| secondary      | varchar(1)   | YES  |     | NULL    |                |
| namespace      | int          | YES  |     | NULL    |                |
| edition_name   | varchar(30)  | YES  |     | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)

一共有116w行數據

mysql> select count(*) from t;
+----------+
| count(*) |
+----------+
|  1167136 |
+----------+
1 row in set (0.17 sec)

想要執行update t set owner='SB';

肯定不能直接跑上面SQL,其一,這是一個大事務,會導致主從延遲,其次,SQL沒有where過濾條件,會把整個表鎖住

MySQL可以利用主鍵切片的方法對上面SQL進行切片:https://blog.csdn.net/robinson1988/article/details/106007292

對錶根據主鍵切片之後,可以將數據分爲多份,然後開多個窗口並行執行,這樣就能加快UPDATE執行速度

下面是Python全自動主鍵切片+並行執行腳本,腳本里面是將數據切分爲4分,開4個並行進程

from multiprocessing import Pool
import pymysql
import time
import os
import io

def processData(txt):
 print('開始執行:', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
 command='mysql -uroot -poracle -Dtest <'+txt
 os.system(command)
 print('執行完成:', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
 return

if __name__ == '__main__':
 conn = pymysql.connect("192.168.56.10", "scott", "tiger", "test")
 cur = conn.cursor()
 sql = "SELECT concat('update t set owner= ''SB'' where object_id>',avg_row * (n - 1),' and object_id<=', avg_row * n, ';') split_sql " \
       "FROM (SELECT n, min_id, max_id, ceil(max_id / 100) avg_row FROM (WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n + 1  FROM x  WHERE n < 100" \
       ") SELECT *  FROM x) a, (SELECT min(object_id) min_id FROM t) b, (SELECT max(object_id) max_id  FROM t) c ) a"
 cur.execute(sql)
 rows = []
 result = cur.fetchall()
 slice0 = open('slice0.txt', 'w', newline='', encoding='utf8')
 slice1 = open('slice1.txt', 'w', newline='', encoding='utf8')
 slice2 = open('slice2.txt', 'w', newline='', encoding='utf8')
 slice3 = open('slice3.txt', 'w', newline='', encoding='utf8')
 for i in range(1,len(result)+1):
  if i%4==0:
     slice0.writelines(result[i - 1][0] + '\n')
  elif i%4==1:
     slice1.writelines(result[i - 1][0] + '\n')
  elif i%4==2:
     slice2.writelines(result[i - 1][0] + '\n')
  elif i%4==3:
     slice3.writelines(result[i - 1][0] + '\n')
 slice0.close()
 slice1.close()
 slice2.close()
 slice3.close()
 cur.close()
 conn.close()

with Pool(4) as pool:
  pool.map(processData,['slice0.txt','slice1.txt','slice2.txt','slice3.txt'])

單個窗口跑:

mysql> update t set owner='NC';
Query OK, 1167136 rows affected (40.34 sec)
Rows matched: 1167136  Changed: 1167136  Warnings: 0

根據主鍵切片開並行跑:

[root@server ~]# python3 update.py
開始執行: 2020-05-29 15:53:38
開始執行: 2020-05-29 15:53:38
開始執行: 2020-05-29 15:53:38
開始執行: 2020-05-29 15:53:38
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
執行完成: 2020-05-29 15:53:49
執行完成: 2020-05-29 15:53:49
執行完成: 2020-05-29 15:53:49
執行完成: 2020-05-29 15:53:50

可以看到開4個並行進程,總共只需要10秒鐘就跑完了,單窗口跑要40秒,正好提升4倍性能

最後,要開多少個並行進程跟你機器CPU有關,不建議開超過CPU CORE這麼多個並行進程

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章