sqlalchemy 報錯 Lost connection to MySQL server during query 解決

image

最近在開發過程中遇到一個sqlalchemy lost connection的報錯,記錄解決方法。

報錯信息

python後端開發,使用的框架是Fastapi + sqlalchemy。在一個接口請求中報錯如下:

[2023-03-24 06:36:35 +0000] [217] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/base.py", line 26, in __call__
    await response(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/responses.py", line 224, in __call__
    await run_until_first_complete(
  File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 24, in run_until_first_complete
    [task.result() for task in done]
  File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 24, in <listcomp>
    [task.result() for task in done]
  File "/usr/local/lib/python3.8/dist-packages/starlette/responses.py", line 216, in stream_response
    async for chunk in self.body_iterator:
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/base.py", line 56, in body_stream
    task.result()
  File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette_exporter/middleware.py", line 289, in __call__
    await self.app(scope, receive, wrapped_send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/dist-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 55, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.8/dist-packages/starlette/responses.py", line 146, in __call__
    await self.background()
  File "/usr/local/lib/python3.8/dist-packages/starlette/background.py", line 35, in __call__
    await task()
  File "/usr/local/lib/python3.8/dist-packages/starlette/background.py", line 20, in __call__
    await run_in_threadpool(self.func, *self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/ymir_app/app/libs/datasets.py", line 330, in ats_import_dataset_in_backgroud
    task = crud.task.create_placeholder(
  File "/app/ymir_app/app/crud/crud_task.py", line 81, in create_placeholder
    db.commit()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 1428, in commit
    self._transaction.commit(_to_root=self.future)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 829, in commit
    self._prepare_impl()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
    self.session.flush()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 3298, in flush
    self._flush(objects)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 3438, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
    compat.raise_(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 3398, in _flush
    flush_context.execute()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
    rec.execute(self)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/persistence.py", line 242, in save_obj
    _emit_insert_statements(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/persistence.py", line 1219, in _emit_insert_statements
    result = connection._execute_20(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1582, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1451, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1813, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1994, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 717, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 548, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 775, in _read_query_result
    result.read()
  File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 1156, in read
    first_packet = self.connection._read_packet()
  File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 692, in _read_packet
    packet_header = self._read_bytes(4)
  File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 748, in _read_bytes
    raise err.OperationalError(
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
[SQL: INSERT INTO task (name, hash, type, state, parameters, config, percent, duration, error_code, user_id, project_id, dataset_id, model_stage_id, is_terminated, is_deleted, last_message_datetime, create_datetime, update_datetime) VALUES (%(name)s, %(hash)s, %(type)s, %(state)s, %(parameters)s, %(config)s, %(percent)s, %(duration)s, %(error_code)s, %(user_id)s, %(project_id)s, %(dataset_id)s, %(model_stage_id)s, %(is_terminated)s, %(is_deleted)s, %(last_message_datetime)s, %(create_datetime)s, %(update_datetime)s)]
[parameters: {'name': 't0000001000012b2ae341679639795', 'hash': 't0000001000012b2ae341679639795', 'type': 5, 'state': 1, 'parameters': '{"group_name": "from_ats_6579a9116a", "description": null, "project_id": 12, "input_url": null, "input_dataset_id": null, "input_dataset_name": null, "input_path": "/data/ymir-workplace/ymir-sharing/3c87e23bb8904b638a9479d6e68aea23", "strategy": 4, "source": 5, "import_type": 5}', 'config': None, 'percent': 0, 'duration': None, 'error_code': None, 'user_id': 1, 'project_id': 12, 'dataset_id': None, 'model_stage_id': None, 'is_terminated': 0, 'is_deleted': 0, 'last_message_datetime': datetime.datetime(2023, 3, 24, 6, 36, 35, 351864), 'create_datetime': datetime.datetime(2023, 3, 24, 6, 36, 35, 351870), 'update_datetime': datetime.datetime(2023, 3, 24, 6, 36, 35, 351873)}]
(Background on this error at: http://sqlalche.me/e/14/e3q8)

主要報錯信息是:
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')

在網上搜了很多答案包括:

  1. 設置sqlalchemy 回收鏈接的時間爲10分鐘 pool_recycle
    engine = create_engine(url, pool_recycle=600)

  2. 設置每次session操作之前檢查 pool_pre_ping
    engine = create_engine("mysql+pymysql://user:pw@host/db", pool_pre_ping=True,pool_recycle=1800)

  3. 不使用連接池
    engine = create_engine("mysql+pymysql://user:pw@host/db", pool_pre_ping=True,pool_recycle=-1)

  4. 檢查數據庫設置的連接超時時間

經過以上一些列操作還是不能解決問題。於是仔細分析這個問題出現的原因。

分析問題原因

從字面意思來看就是數據庫在查詢時丟失了連接,這裏的連接也就是session。這個接口是一個操作很多的任務,要下載大量數據集,通常在20G以上,所以設計成異步接口。請求接口之後獲取一個數據庫session,然後處理簡單任務直接返回一個成功的狀態,最後將耗時任務放在後臺任務完成。這裏的後臺任務是Fastapi自身的功能,專門用於處理一些小型的耗時任務,如發送郵件等。lost connect 就是發生在後臺任務中。
抽象任務流程:

  1. 用戶調用接口時獲取session
  2. 異步接口直接返回
  3. 後臺任務下載數據庫30分鐘左右
  4. 下載完成更新數據庫狀態,錯誤發生。

所以通過分析這個任務的流程可以發現是持有session過長導致的。從接口請求的開始就獲取了該session,然後將session傳遞到後臺任務中,經過30分鐘之後纔再次使用該session,就發生了lost connection的問題。

解決辦法

知道問題症狀所在就知道如何對症下藥的了,就是在後臺下載任務30分鐘之後更新數據庫時重新獲取一個session,不復用之前的session,這樣就就解決了這個問題。

這個問題之所以沒有發現是因爲按照官網的介紹pool_recycle字段就是負責回收session,配合pool_pre_ping每次使用session之前檢查一次就能解決這個session斷聯的問題。但是似乎在配置的pool_recycle醒沒有生效。

可能這個問題是我自身沒配置好導致的,但是也可以作爲解決此類問題的一個思路。遇到類似問題排查時思考一下,是不是持有session時間過長。

附錄猜測過程

image

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章