MySQL數據庫連接的相關探索

問題的起源是由於發現在公司的很多項目中使用MySQL的時候,都沒有關閉數據庫連接的操作。這主要是由於一般一個項目就共用一個global的db connection,不需要關閉,因爲始終都要用這個connection。那麼到底是否真的不需要關閉connection?以及共用一個db connection的實例是否有什麼問題?最佳實踐究竟是怎樣的?這些問題驅使我開始本次探索之旅。

爲什麼要close connection?

筆者查詢了很多資料,沒有找到說得特別清晰的文章,大多數是說這是一個好的習慣或者是最佳實踐之類,但並沒有詳細闡釋爲什麼?那麼到底不去close connection會帶來什麼問題呢?

不去close connection會有兩種情況:

  1. 程序的進程結束了
  2. 程序的進程始終在運行

進程結束

我們先來看下第一種情況,即程序結束了,mysql會如何處理connection,這種情況又可以分爲兩種種情況:

  • 程序正常結束
  • 程序非正常結束
    我們通過如下代碼來驗證一下
# 程序正常結束的情況
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import os
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


def test_not_close_conn_normal_exit():
    print('begin connecting to mysql')
    conn = MySQLdb.Connection(autocommit=True, **db_config)

    # sleep 30s to check the connection
    print('mysql connected. sleeping ...')
    time.sleep(30)

    # exit program without close connection to see
    # if mysql can close the connection on server side
    print('after sleep, now exit program.')
    sys.exit()

if __name__ == '__main__':
    test_not_close_conn_normal_exit()

在運行程序之前,我們看下MySQL client來查詢當前的連接情況,如下:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

可以看出此時只有一個連接,即當前mysql client的連接
運行程序,然後再次查詢,如下:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 379 | root | localhost       | testorder | Sleep   | 4    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

看到此時多了一個連接,且狀態爲Sleep,這是由於我們建立連接後沒有做任何操作。

sleep之後,程序結束,且沒有close connection的動作,此時再次查詢,發現如下結果:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

看來在程序退出後,一旦socket連接斷開,MySQL會自動關閉connection,回收資源。

那麼再來看下,如果程序沒有正常退出會怎麼樣呢?
我們用如下代碼來驗證下:

# 程序非正常結束的情況
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import os
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


def test_not_close_conn_unnormal_exit():
    print('begin connecting to mysql')
    conn = MySQLdb.Connection(autocommit=True, **db_config)

    # sleep to check the connection
    print('mysql connected. sleeping ...')
    pid = os.getpid()
    print(f'current pid is {pid}, you can kill me now ...')
    time.sleep(2000)

if __name__ == '__main__':
    test_not_close_conn_unnormal_exit()

運行該程序後,我們先查下connection情況,可以看到新增了一個連接。

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 380 | root | localhost       | testorder | Sleep   | 5    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

直接使用kill -9指令kill掉測試進程。
然後再次查詢,發現如下結果:

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

也就是MySQL依然可以感知到socket連接的斷開,並自動關閉connection。

結論:無論客戶端進程如何結束,一旦socket連接斷開,mysql都會自動關閉連接。

進程未結束

進程未結束,socket連接也沒有斷開的情況下,connection處於inactive狀態,就是我們在上例中看到的Sleep狀態的連接。而這種狀態是否可以一直持續呢?

答案是,可以持續一段時間直到timeout,連接纔會關閉。

那麼這個timeout時間是多少,由什麼參數決定呢?
簡單來說其是由參數interactive_timeoutwait_timeout來決定的,而這兩個參數默認都是28800s,也就是8小時。關於這兩個參數的說明,可以參考MySQL中interactive_timeout和wait_timeout的區別

結論: 在進程未結束,且socket連接未斷開的情況下,connection會保持很長一段時間不會被關閉(默認是8小時)

這種情況下就會出現一種情況,即如果我們有多個程序訪問同一個數據庫,且都是多進程或者多線程程序,並長時間運行,開了多個connection,都始終沒有關閉。這種極端情況會導致數據庫有大量的connection連接,而通常MySQL的連接數是有限制的(可以如下查詢),如果超出這個限制,就會出現連接的問題,導致我們無法連接數據庫。

mysql> show variables like '%max_connections%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| max_connections              | 1000  |
+------------------------------+-------+

在某些情況下,比如我們的程序只有一個線程,且始終共用一個connection實例,並且這個實例始終都在使用,這種情況下即使不去close connection問題也不大,且這樣我們可以減少open/close connection的一些開銷。當然,我們實際上還可以有更好的選擇,那就是下面要介紹的DBUtil工具。

DBUtils

對於Python而言,一個常被提及的開源庫DBUtils,可以幫助我們來建立數據庫連接,維護連接狀態,甚至可以提供連接池功能。下面我們就來了解下這個庫。

安裝

pip install DBUtils

使用

DBUtil主要提供兩個類供我們使用,一個是PersistentDB, 另一個是PooledDB
PersistentDB主要爲單線程應用提供一個持久的連接,而PooledDB通常可以爲多線程應用提供線程池服務。

PersistentDB

Whenever a thread opens a database connection for the first time, a new connection to the database will be opened that will be used from now on for this specific thread. When the thread closes the database connection, it will still be kept open so that the next time when a connection is requested by the same thread, this already opened connection can be used. The connection will be closed automatically when the thread dies.
In short: PersistentDB tries to recycle database connections to increase the overall database access performance of your threaded application, but it makes sure that connections are never shared between threads.

翻譯過來就是,當一個線程新建了一個數據庫連接時,只有這個線程可以使用該連接,且當線程close connection時,該connection並沒有真正close,而是被PersistentDB回收,下次該線程再次獲取connection時,PersistentDB會將之前的connection重新給該線程(注意是同一個線程),這樣就減少了開關connection的開銷,而當該線程死掉的時候,該connection會被自動釋放。正是基於這些特性,我們在使用DBUtils的時候,可以在不需要connection的時候立即關閉,因爲即使頻繁的新建和關閉connection也是沒關係的,由於都是本地的計算,開銷極小,基本可以忽略其connection建立和關閉的開銷。

因此對於單線程應用來說,使用PersistentDB會是非常好的選擇,不僅可以保證線程安全,還可以減少我們open/close的開銷,提高性能。

  • 代碼示例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import threading
import os
# 注意這裏PersistentDB是在PersistentDB Module當中
from DBUtils.PersistentDB import PersistentDB
from DBUtils.PooledDB import PooledDB
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}

db_persis = PersistentDB(
    # creator即你使用的db driver
    creator=MySQLdb,
    # 如果在支持threading.local的環境下可以使用如下配置方式,性能更好
    threadlocal=threading.local,
    **db_config
)


def test_with_dbutils_persistent_conn():
    print('begin connecting to mysql')
    conn = db_persis.connection()

    print('after get connection, sleep 100s')
    time.sleep(100)

    # 這裏close並沒有真正關閉數據庫的connection
    # 而是被PersistentDB回收
    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()


if __name__ == '__main__':
    test_with_dbutils_persistent_conn()


  • 運行說明

在運行前看下當前的connection連接

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

運行程序,建立連接後(after get connection, sleep 100s),再次查詢連接情況

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 419 | root | localhost       | testorder | Sleep   | 3    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

關閉連接後(close function already called, sleep 100s again),可以看到connection仍然存在

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 419 | root | localhost       | testorder | Sleep   | 107  |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

程序退出後,connection被實際關閉

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
  • PersistentDB的說明

通過help(PersistentDB) 可以看到說明

class PersistentDB(builtins.object)
 |  PersistentDB(creator, maxusage=None, setsession=None, failures=None, ping=1, closeable=False, threadlocal=None, *args, **kwargs)
 |
 |  Generator for persistent DB-API 2 connections.
 |
 |  After you have created the connection pool, you can use
 |  connection() to get thread-affine, steady DB-API 2 connections.
 |
 |  Methods defined here:
 |
 |  __init__(self, creator, maxusage=None, setsession=None, failures=None, ping=1, closeable=False, threadlocal=None, *args, **kwargs)
 |      Set up the persistent DB-API 2 connection generator.
 |
 |      creator: either an arbitrary function returning new DB-API 2
 |          connection objects or a DB-API 2 compliant database module
 |      maxusage: maximum number of reuses of a single connection
 |          (number of database operations, 0 or None means unlimited)
 |          Whenever the limit is reached, the connection will be reset.
 |      setsession: optional list of SQL commands that may serve to prepare
 |          the session, e.g. ["set datestyle to ...", "set time zone ..."]
 |      failures: an optional exception class or a tuple of exception classes
 |          for which the connection failover mechanism shall be applied,
 |          if the default (OperationalError, InternalError) is not adequate
 |      ping: determines when the connection should be checked with ping()
 |          (0 = None = never, 1 = default = whenever it is requested,
 |          2 = when a cursor is created, 4 = when a query is executed,
 |          7 = always, and all other bit combinations of these values)
 |      closeable: if this is set to true, then closing connections will
 |          be allowed, but by default this will be silently ignored
 |      threadlocal: an optional class for representing thread-local data
 |          that will be used instead of our Python implementation
 |          (threading.local is faster, but cannot be used in all cases)
 |      args, kwargs: the parameters that shall be passed to the creator
 |          function or the connection constructor of the DB-API 2 module

PooledDB

PooledDB既可以share connection,也可以使用獨享的connection,關鍵是看程序在獲取connection的時候參數shareable的設置。默認的情況下該參數爲True,即獲取的connection可以共享,這種情況下不同線程可以使用同一個connection。

  • 代碼示例
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
mysqlconn.py
~~~~~~~~~~~~
test mysql connection related functions.

:copyright: (c) 2019 by Geekpy.

"""
import MySQLdb
import time
import threading
import os
from DBUtils.PersistentDB import PersistentDB
from DBUtils.PooledDB import PooledDB
import sys

db_config = {
    'host': 'localhost',
    'port': 3306,
    'db': 'testorder',
    'user': 'root',
    'password': 'LocalTest'
}


db_pool = PooledDB(
    creator=MySQLdb,
    mincached=2,
    maxconnections=20,
    **db_config
)


def test_with_pooleddb_conn():
    print('begin connecting to mysql')
    conn = db_pool.connection()

    print('after get connection, sleep 100s')
    time.sleep(100)

    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()


if __name__ == '__main__':
    test_with_pooleddb_conn()

  • 運行說明
    在運行前看下當前的connection連接
mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

運行程序,建立連接後(after get connection, sleep 100s)。由於我們設置了參數mincached=2,所以會有兩個connection建立出來

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 420 | root | localhost       | testorder | Sleep   | 6    |          | <null>           |
| 421 | root | localhost       | testorder | Sleep   | 6    |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

同樣,當我們調用close之後,連接並不會真的關閉,而是被PooledDB回收。同時PooledDB可以通過參數設置是否可以將connection分享給其它線程,這樣就可以多個線程共享同一個連接。顯然,由於連接池的存在,多線程在連接效率上將大幅提升。

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
| 420 | root | localhost       | testorder | Sleep   | 112  |          | <null>           |
| 421 | root | localhost       | testorder | Sleep   | 12   |          | <null>           |
+-----+------+-----------------+-----------+---------+------+----------+------------------+

程序退出後,connection被實際關閉

mysql> show processlist;
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| Id  | User | Host            | db        | Command | Time | State    | Info             |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
| 298 | root | localhost:59682 | testorder | Query   | 0    | starting | show processlist |
+-----+------+-----------------+-----------+---------+------+----------+------------------+
  • PooldedDB的說明

通過help(PooledDB) 可以看到說明

class PooledDB(builtins.object)
 |  PooledDB(creator, mincached=0, maxcached=0, maxshared=0, maxconnections=0, blocking=False, maxusage=None, setsession=None, reset=True, failures=None, ping=1, *args, **kwargs)
 |
 |  Pool for DB-API 2 connections.
 |
 |  After you have created the connection pool, you can use
 |  connection() to get pooled, steady DB-API 2 connections.
 |
 |  Methods defined here:
 |
 |  __del__(self)
 |      Delete the pool.
 |
 |  __init__(self, creator, mincached=0, maxcached=0, maxshared=0, maxconnections=0, blocking=False, maxusage=None, setsession=None, reset=True, failures=None, ping=1, *args, **kwargs)
 |      Set up the DB-API 2 connection pool.
 |
 |      creator: either an arbitrary function returning new DB-API 2
 |          connection objects or a DB-API 2 compliant database module
 |      mincached: initial number of idle connections in the pool
 |          (0 means no connections are made at startup)
 |      maxcached: maximum number of idle connections in the pool
 |          (0 or None means unlimited pool size)
 |      maxshared: maximum number of shared connections
 |          (0 or None means all connections are dedicated)
 |          When this maximum number is reached, connections are
 |          shared if they have been requested as shareable.
 |      maxconnections: maximum number of connections generally allowed
 |          (0 or None means an arbitrary number of connections)
 |      blocking: determines behavior when exceeding the maximum
 |          (if this is set to true, block and wait until the number of
 |          connections decreases, otherwise an error will be reported)
 |      maxusage: maximum number of reuses of a single connection
 |          (0 or None means unlimited reuse)
 |          When this maximum usage number of the connection is reached,
 |          the connection is automatically reset (closed and reopened).
 |      setsession: optional list of SQL commands that may serve to prepare
 |          the session, e.g. ["set datestyle to ...", "set time zone ..."]
 |      reset: how connections should be reset when returned to the pool
 |          (False or None to rollback transcations started with begin(),
 |          True to always issue a rollback for safety's sake)
 |      failures: an optional exception class or a tuple of exception classes
 |          for which the connection failover mechanism shall be applied,
 |          if the default (OperationalError, InternalError) is not adequate
 |      ping: determines when the connection should be checked with ping()
 |          (0 = None = never, 1 = default = whenever fetched from the pool,
 |          2 = when a cursor is created, 4 = when a query is executed,
 |          7 = always, and all other bit combinations of these values)
 |      args, kwargs: the parameters that shall be passed to the creator
 |          function or the connection constructor of the DB-API 2 module

Transaction

DBUtils對於Transaction有些特別的要求,這裏單獨拿出來說下。

def test_with_transaction():
    print('begin connecting to mysql')
    conn = db_pool.connection()

    # 必須先調用begin來開啓一個事務
    conn.begin()
    with conn.cursor() as cursor:
        cursor.execute("UPDATE migration_info SET status='prepare' WHERE id=4")

    print('execute but not commit')
    time.sleep(100)
    
    # 這裏commit之後纔會真正提交給數據庫
    conn.commit()
    conn.close()
    print('close function already called, sleep 100s again')

    time.sleep(100)
    sys.exit()

References

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章