題目:
Write a SQL query to delete all duplicate email entries in a table named Person
, keeping
only unique emails based on its smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | [email protected] | | 2 | [email protected] | | 3 | [email protected] | +----+------------------+ Id is the primary key column for this table.
For example, after running your query, the above Person
table should have the following
rows:
+----+------------------+ | Id | Email | +----+------------------+ | 1 | [email protected] | | 2 | [email protected] | +----+------------------+解析:
最開始想到的做法是得到所有重複的Email地址的最小ID,然後把該Email地址的所有其它ID全部刪除,代碼如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE
Email IN (SELECT Email FROM Person GROUP BY Email HAVING COUNT(Email) >1)
AND Id NOT IN (SELECT MIN(Id) FROM Person GROUP BY Email HAVING COUNT(Email) >1);
但是,卻報瞭如下的錯誤:You can't specify target table 'Person' for update in FROM clause
查詢之後得知,原因是不能對同一個表先select,之後再做update操作,需要加一箇中間表。修改後代碼如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE Email IN
(SELECT t.Email FROM (SELECT Email FROM Person GROUP BY Email HAVING COUNT(Email) >1) t)
AND Id NOT IN (SELECT s.Id FROM (SELECT MIN(Id) AS Id FROM Person GROUP BY Email HAVING COUNT(Email) >1) s);
這次,可以accept了。但是,總覺得代碼太繁瑣。重新思考之後,其實我們可以把所有Email地址的最小ID全部都檢索出來,不管是否有重複。之後,再把其餘所有的ID都刪除就好了。修改後代碼如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE Id NOT IN (SELECT s.Id FROM (SELECT MIN(Id) AS Id FROM Person GROUP BY Email) s);