詭異的sqlite3之malformed錯誤(一)

詭異的sqlite3之malformed錯誤(一)

現象

  • 現場設備生成並插入大規模的數據,設備異常將數據庫拉出來檢查時,報告malformed錯誤
  • sqlite3 版本 3.8.6
  • 數據庫文件大小 210 MB

定位問題

  • 數據大小

    select count(*) from DataSheet;
    /*結果*/
    950131
    
  • 主鍵

    primary key(ctype,id,DataTime)
    
  • 復現 (SQLite Expert)

    select * from DataSheet where id='000537719140' order by DataTime;  /*malformed*/
    select * from DataSheet where ctype=5002 and id='000537719140' order by DataTime; /*OK*/
    

    如上所述,這兩條SQL語句差異只有where子句中是否含有oad條件,爲什麼會如此呢,先執行如下SQL比較一二:

    select * from DataSheet where id='000537719140';  /*OK*/
    select * from DataSheet order by DataTime;     /*malformed, 全表掃描*/
    
    • where子句中有ctype,id,DataTime時命中索引(主鍵),此時,order by DataTime對符合條件的索引排序,因此,問題不復現
    • where子句不完全匹配有ctype,id,DataTime時,未命中索引(主鍵),order by DataTime對符合條件的結果排序,實際上執行全表掃描,問題復現
  • 測試代碼: 定位行

    char * malformed = "select rowid, ctype, id, StartTime, DataTime from DataSheet;"; /*malformed*/
    char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;"; /*OK*/
    char * sql = okayQuery;
    sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL);
    ncols = sqlite3_column_count(stmt);
    int rowno = 0;
    do {
        rc = sqlite3_step(stmt);
        switch (rc) {
        case SQLITE_ROW:	break;
        case SQLITE_DONE:	rowno = -100;	break;
        default:			fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc);  break;
        }
        if (rowno >= 0) { rowno++; }
    }
    while (rowno >= 0);
    sqlite3_finalize(stmt);
    
    • sql = okayQuery,共查詢950131(rowno:950130)條記錄,然後SQLITE_DONE結束
    • sql = malformed,取第950300條記錄時報告錯誤11 (SQLITE_CORRUPT)
      • 查詢正常時,只能遍歷950131(rowno:950130)條記錄
      • 查詢異常時,居然遍歷到950300(rowno:950301)條記錄時出錯!
  • 測試代碼: 定位時間

    char * malformed = "select rowid, ctype, id, DataTime, StartTime from DataSheet;";
    char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;";
    char * sql = malformed;
    sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL);
    ncols = sqlite3_column_count(stmt);
    int rowno = 0;
    do {
      rc = sqlite3_step(stmt);
      switch (rc) {
      case SQLITE_ROW:
        if (rowno >= 950129 && rowno <= 950301/* 950300 */) {
          rowid	 = sqlite3_column_int(stmt, 0);
          ctype  = (char*)sqlite3_column_text(stmt, 1);
          id = (char*)sqlite3_column_text(stmt, 2);
          colltime = (char *)sqlite3_column_text(stmt, 3);
          fprintf(stderr, "rowno: %d, rowid: %i, ctype: %s, id:%s, time:%s\n", rowno, rowid, (ctype?ctype:snull),(id?id:snull), (colltime?colltime:snull));
        }
        break;
      case SQLITE_DONE:	rowno = -100;	break;
      default:			fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc);	break;
      }
      if (rowno >= 0) { rowno++; }
    }
    while (rowno >= 0);
    sqlite3_finalize(stmt);
    

    輸出:

    rowno: 950129, rowid: 4478623, ctype: 5002, id:001548532328, time:20191220105300
    rowno: 950130, rowid: 4478624, ctype: 5002, id:001548532328, time:20191220105400
    ...
    rowno: 950297, rowid: 4478791, ctype: 5002, id:001548532342, time:20191220105800
    rowno: 950298, rowid: 4478792, ctype: 5002, id:001548532342, time:20191220105900
    rowno: 950299, rowid: 4478793, ctype: 5002, id:001548532342, time:20191220110100
    rowno = 950300, sqlite3 error 11
    rowno: 950301, rowid: 3526517, ctype: 5002, id:005410401490, time:20191214202200
    

日誌

  • 20191220.log
    #12-20 11:21:30.098: 寫入[38] 20191220-111700  ...
    #12-20 11:21:30.433: 寫入[39] 20191220-111800  ...
    #12-20 11:21:30.434: 寫入[40] 20191220-111900  ...
    

檢查下一天的數據庫

  • 2019/12/21的數據庫
  • 數據庫文件大小: 210 MB
    • 數據庫大小相同
  • Check菜單
    • OK

分析

  • 2019/12/20的數據庫有malformed問題,而2019/12/21的數據庫不存在該問題,可能:
    • 可能原因一:數據庫已自動恢復
      • 數據週期未抵達:非通過清理自動恢復
      • 設備中無修復功能
    • 可能原因二:數據庫拷貝問題
      • 最後的日誌時標: 12-20 11:24:21.057# ...
      • 如果先ftp拉取數據庫,然後,保存shell日誌,則拉取數據庫時,有可能數據庫正在執行一個事務,從而導致該問題

結論

  • ftp拉取數據庫時,數據庫正在執行寫數據事務,從而導致malformed問題
發佈了92 篇原創文章 · 獲贊 22 · 訪問量 29萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章