刪除表中重複數據sql語句 優化!!!

有天這個需求需要實現

於是我去百度

得到如下結果:

 

SQL:刪除重複數據,只保留一條用SQL語句,刪除掉重複項只保留一條在幾千條記錄裏,存在着些相同的記錄,如何能用SQL語句,刪除掉重複的呢

 1、查找表中多餘的重複記錄,重複記錄是根據單個字段(peopleId)來判斷 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

2、刪除表中多餘的重複記錄,重複記錄是根據單個字段(peopleId)來判斷,只留有rowid最小的記錄 delete from people where   peopleName in (select peopleName    from people group by peopleName      having count(peopleName) > 1) and   peopleId not in (select min(peopleId) from people group by peopleName     having count(peopleName)>1)

3、查找表中多餘的重複記錄(多個字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

4、刪除表中多餘的重複記錄(多個字段),只留有rowid最小的記錄 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

5、查找表中多餘的重複記錄(多個字段),不包含rowid最小的記錄 select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)  

6.消除一個字段的左邊的第一位:

update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'

7.消除一個字段的右邊的第一位:

update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'

8.假刪除表中多餘的重複記錄(多個字段),不包含rowid最小的記錄 update vitae set ispass=-1 where peopleId in (select peopleId from vitae group by peopleId,seq having count(*) > 1) and seq in (select seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1) 

 

便選取一條

刪除表中多餘的重複記錄(多個字段),只留有rowid最小的記錄 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

 

由於需求實現需要五個字段便改如下:

DELETE
FROM
    TEMP_registration_information
WHERE
    (
        level1options,
        level2options,
        level3options,
        level4options,
        NAME
    ) IN (
        SELECT
            level1options,
            level2options,
            level3options,
            level4options,
            NAME
        FROM
            (
                SELECT
                    level1options,
                    level2options,
                    level3options,
                    level4options,
                    NAME
                FROM
                    TEMP_registration_information
                GROUP BY
                    level1options,
                    level2options,
                    level3options,
                    level4options,
                    NAME
                HAVING
                    count(*) > 1
            ) AS tmp
    )
AND id NOT IN (
    SELECT
        id
    FROM
        (
            SELECT
                min(id)
            FROM
                TEMP_registration_information
            GROUP BY
                level1options,
                level2options,
                level3options,
                level4options,
                NAME
            HAVING
                count(*) > 1
        ) AS temp
)

做了臨時表

SELECT
        id
    FROM
        (
            SELECT
                min(id)
            FROM
                TEMP_registration_information
            GROUP BY
                level1options,
                level2options,
                level3options,
                level4options,
                NAME
            HAVING
                count(*) > 1
        ) AS temp

因爲運行

delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

這種會報錯

因爲在mysql中,不能在一條Sql語句中,即查詢這些數據,同時修改這些數據

所以增加臨時表,

但運行後,耗時太長,5,6分鐘毫無停止跡象,繼續運行。

所以思考新方法,嘗試對使用的5個字段增加索引,但耗時依舊5,6分鐘毫無停止跡象,繼續運行。

所以繼續思考

最後想到

DELETE
FROM
  TEMP_registration_information
WHERE id  IN (
  SELECT
    id
  FROM
    (
      SELECT
        min(id) id
      FROM
        TEMP_registration_information
      GROUP BY
        level1options,
        level2options,
        level3options,
        level4options,
        NAME
      HAVING
        count(*) > 1
    ) AS temp
)

這個語句在8萬條數據下,運行時長爲約20s,

重複運行直到刪除條數爲0

就可以!

當然這種方式有些笨,本人不是專業寫sql,

所以也歡迎有更好方法可以交流,留言。

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章