scrapy中運行一段時間報錯pymysql.err.InterfaceError: (0, '')

今天早上一起來,發現有兩三個節點的scrapy瘋狂報錯,將近幾萬頁,錯誤信息爲:

	
	2019-07-12 21:48:44 [twisted] CRITICAL: Rollback failed
Traceback (most recent call last):
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/enterprise/adbapi.py", line 474, in _runInteraction
    conn.rollback()
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/enterprise/adbapi.py", line 52, in rollback
    self._connection.rollback()
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/pymysql/connections.py", line 431, in rollback
    self._execute_command(COMMAND.COM_QUERY, "ROLLBACK")
  File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/pymysql/connections.py", line 745, in _execute_command
    raise err.InterfaceError("(0, '')")
pymysql.err.InterfaceError: (0, '')

翻了一下日誌信息,發現有一個特點,就是在報錯之前的信息大多都是抓取listing的信息,而沒有返回item,也沒有item入庫的日誌打印,所以我懷疑是adbapi的連接池中的連接很久沒有使用導致連接被mysql銷燬,插入的時候失敗,然後事務回滾,連接池中數據庫連接異常導致回滾失敗,最後得到這個報錯信息。

我item pipeline中使用數據庫連接方法爲:

    # pipeline默認調用
    def process_item(self, item, spider):
        query = self.dbpool.runInteraction(self._conditional_insert, item)
        query.addErrback(self._handle_error, item, spider)
        return item

最後我修改了一下,每次插入數據的時候ping一下,如果重連失敗就將整個數據庫連接池初始化一遍,完整的代碼爲

class MyPipeline(object):

    def __init__(self, dbpool):
        self.dbpool = dbpool

    @classmethod
    def from_settings(cls, settings):
        dbpool = MysqlConnectionPool().dbpool()
        return cls(dbpool)

    # pipeline默認調用
    def process_item(self, item, spider):
        query = self.dbpool.runInteraction(self._conditional_insert, item)
        query.addErrback(self._handle_error, item, spider)
        return item

    def _handle_error(self, failue, item, spider):
        print(failue)

    def _conditional_insert(self, transction, item):
        tt = transction._connection._connection
        try:
            tt.ping()
        except:
	        self.dbpool.close()
            self.dbpool = MysqlConnectionPool().dbpool()
        sql = """insert INTO `DOC_BASEINFO`(doc_type,author_org )
        VALUES (%s,%s)"""
        params = (
            item['doc_type'], item['author_org'])
        transction.execute(sql, params)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章