有天這個需求需要實現
於是我去百度
得到如下結果:
SQL:刪除重複數據,只保留一條用SQL語句,刪除掉重複項只保留一條在幾千條記錄裏,存在着些相同的記錄,如何能用SQL語句,刪除掉重複的呢
1、查找表中多餘的重複記錄,重複記錄是根據單個字段(peopleId)來判斷 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
2、刪除表中多餘的重複記錄,重複記錄是根據單個字段(peopleId)來判斷,只留有rowid最小的記錄 delete from people where peopleName in (select peopleName from people group by peopleName having count(peopleName) > 1) and peopleId not in (select min(peopleId) from people group by peopleName having count(peopleName)>1)
3、查找表中多餘的重複記錄(多個字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
4、刪除表中多餘的重複記錄(多個字段),只留有rowid最小的記錄 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
5、查找表中多餘的重複記錄(多個字段),不包含rowid最小的記錄 select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
6.消除一個字段的左邊的第一位:
update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'
7.消除一個字段的右邊的第一位:
update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'
8.假刪除表中多餘的重複記錄(多個字段),不包含rowid最小的記錄 update vitae set ispass=-1 where peopleId in (select peopleId from vitae group by peopleId,seq having count(*) > 1) and seq in (select seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
便選取一條
刪除表中多餘的重複記錄(多個字段),只留有rowid最小的記錄 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
由於需求實現需要五個字段便改如下:
DELETE
FROM
TEMP_registration_information
WHERE
(
level1options,
level2options,
level3options,
level4options,
NAME
) IN (
SELECT
level1options,
level2options,
level3options,
level4options,
NAME
FROM
(
SELECT
level1options,
level2options,
level3options,
level4options,
NAME
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS tmp
)
AND id NOT IN (
SELECT
id
FROM
(
SELECT
min(id)
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
)
做了臨時表
SELECT
id
FROM
(
SELECT
min(id)
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
因爲運行
delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
這種會報錯
因爲在mysql中,不能在一條Sql語句中,即查詢這些數據,同時修改這些數據
所以增加臨時表,
但運行後,耗時太長,5,6分鐘毫無停止跡象,繼續運行。
所以思考新方法,嘗試對使用的5個字段增加索引,但耗時依舊5,6分鐘毫無停止跡象,繼續運行。
所以繼續思考
最後想到
DELETE
FROM
TEMP_registration_information
WHERE id IN (
SELECT
id
FROM
(
SELECT
min(id) id
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
)
這個語句在8萬條數據下,運行時長爲約20s,
重複運行直到刪除條數爲0
就可以!
當然這種方式有些笨,本人不是專業寫sql,
所以也歡迎有更好方法可以交流,留言。