怎么样才能更快的删除重复数据,以下我是已经采取过的方案。
5W多条记录 删除重复记录,保留ID最小的行。
`delete from table
where peopleId in
(select peopleId from table group by peopleId having count(peopleId) > 1)
and id not in
(select min(id) from table group by peopleId having count(peopleId)>1)`
出错:
MariaDB>1093 - You can't specify target table 'table' for update in FROM claus
不能先select出同一表中的某些值,再update这个表(在同一语句中),
于是,使用子集。
`delete from table
where peopleId in
(select peopleId from (select tmp.* from table tmp) a
group by peopleId having count(a.peopleId) > 1)
and id not in
(select min(id) from (select tmp.* from table tmp) b
group by peopleId having count(b.peopleId)>1)`
运行几十秒后都没结果,没等结果出来,开始优化
子集只需要需要的列
`delete from table
where peopleId in
(select peopleId from (select tmp.id, from table tmp) a
group by peopleId having count(a.peopleId) > 1)
and id not in
(select min(id) from (select tmp.id,tmp.peopleId from table tmp) b
group by peopleId having count(b.peopleId)>1)`
30分钟无结果
求教各位大神如何做,是速度加快呢?
我练习过一个类似的例子。直接说下几步关键的,你可以用这种方法对你的数据试验下(根据people表的pid判重,保留多条重复记录中pid最小的记录):
A、创建临时表存储有重复的记录的pid
`CREATE TEMPORARY TABLE SELECT pid FROM people
GROUP BY pid HAVING COUNT(pid) > 1;`
B、创建一个临时表存储重复列组中值_rowid最小的列的_rowid:
`CREATE TEMPORARY TABLE t2 SELECT MIN(_rowid) AS rowid FROM people
GROUP BY pid HAVING COUNT(pid) > 1;`
C、删除重复记录DELETE FROM people WHERE pid IN (SELECT pid FROM t1) AN
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。