詭異的sqlite3之malformed錯誤(一)
現象
- 現場設備生成並插入大規模的數據,設備異常將數據庫拉出來檢查時,報告
malformed
錯誤 - sqlite3 版本 3.8.6
- 數據庫文件大小 210 MB
定位問題
-
數據大小
select count(*) from DataSheet; /*結果*/ 950131
-
主鍵
primary key(ctype,id,DataTime)
-
復現 (SQLite Expert)
select * from DataSheet where id='000537719140' order by DataTime; /*malformed*/ select * from DataSheet where ctype=5002 and id='000537719140' order by DataTime; /*OK*/
如上所述,這兩條SQL語句差異只有where子句中是否含有oad條件,爲什麼會如此呢,先執行如下SQL比較一二:
select * from DataSheet where id='000537719140'; /*OK*/ select * from DataSheet order by DataTime; /*malformed, 全表掃描*/
- 當where子句中有ctype,id,DataTime時命中索引(主鍵),此時,
order by DataTime
對符合條件的索引排序,因此,問題不復現 - 當where子句不完全匹配有ctype,id,DataTime時,未命中索引(主鍵),
order by DataTime
對符合條件的結果排序,實際上執行全表掃描,問題復現
- 當where子句中有ctype,id,DataTime時命中索引(主鍵),此時,
-
測試代碼: 定位行
char * malformed = "select rowid, ctype, id, StartTime, DataTime from DataSheet;"; /*malformed*/ char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;"; /*OK*/ char * sql = okayQuery; sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL); ncols = sqlite3_column_count(stmt); int rowno = 0; do { rc = sqlite3_step(stmt); switch (rc) { case SQLITE_ROW: break; case SQLITE_DONE: rowno = -100; break; default: fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc); break; } if (rowno >= 0) { rowno++; } } while (rowno >= 0); sqlite3_finalize(stmt);
- 當
sql = okayQuery
,共查詢950131(rowno:950130)
條記錄,然後SQLITE_DONE
結束 - 當
sql = malformed
,取第950300
條記錄時報告錯誤11 (SQLITE_CORRUPT)
- 查詢正常時,只能遍歷
950131(rowno:950130)
條記錄 - 查詢異常時,居然遍歷到
950300(rowno:950301)
條記錄時出錯!
- 查詢正常時,只能遍歷
- 當
-
測試代碼: 定位時間
char * malformed = "select rowid, ctype, id, DataTime, StartTime from DataSheet;"; char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;"; char * sql = malformed; sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL); ncols = sqlite3_column_count(stmt); int rowno = 0; do { rc = sqlite3_step(stmt); switch (rc) { case SQLITE_ROW: if (rowno >= 950129 && rowno <= 950301/* 950300 */) { rowid = sqlite3_column_int(stmt, 0); ctype = (char*)sqlite3_column_text(stmt, 1); id = (char*)sqlite3_column_text(stmt, 2); colltime = (char *)sqlite3_column_text(stmt, 3); fprintf(stderr, "rowno: %d, rowid: %i, ctype: %s, id:%s, time:%s\n", rowno, rowid, (ctype?ctype:snull),(id?id:snull), (colltime?colltime:snull)); } break; case SQLITE_DONE: rowno = -100; break; default: fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc); break; } if (rowno >= 0) { rowno++; } } while (rowno >= 0); sqlite3_finalize(stmt);
輸出:
rowno: 950129, rowid: 4478623, ctype: 5002, id:001548532328, time:20191220105300 rowno: 950130, rowid: 4478624, ctype: 5002, id:001548532328, time:20191220105400 ... rowno: 950297, rowid: 4478791, ctype: 5002, id:001548532342, time:20191220105800 rowno: 950298, rowid: 4478792, ctype: 5002, id:001548532342, time:20191220105900 rowno: 950299, rowid: 4478793, ctype: 5002, id:001548532342, time:20191220110100 rowno = 950300, sqlite3 error 11 rowno: 950301, rowid: 3526517, ctype: 5002, id:005410401490, time:20191214202200
日誌
- 20191220.log
#12-20 11:21:30.098: 寫入[38] 20191220-111700 ... #12-20 11:21:30.433: 寫入[39] 20191220-111800 ... #12-20 11:21:30.434: 寫入[40] 20191220-111900 ...
檢查下一天的數據庫
- 2019/12/21的數據庫
- 數據庫文件大小: 210 MB
- 數據庫大小相同
- Check菜單
- OK
分析
- 2019/12/20的數據庫有malformed問題,而2019/12/21的數據庫不存在該問題,可能:
- 可能原因一:數據庫已自動恢復
- 數據週期未抵達:非通過清理自動恢復
- 設備中無修復功能
- 可能原因二:數據庫拷貝問題
- 最後的日誌時標:
12-20 11:24:21.057# ...
- 如果先ftp拉取數據庫,然後,保存shell日誌,則拉取數據庫時,有可能數據庫正在執行一個事務,從而導致該問題
- 最後的日誌時標:
- 可能原因一:數據庫已自動恢復
結論
- ftp拉取數據庫時,數據庫正在執行寫數據事務,從而導致malformed問題