一. 封禁全然重复的记录
显然重复的数据,通常是由于未设置主键/唯一键约束引发的。
测试数据:
粘贴代码 代码如下:
if OBJECT_ID('duplicate_all') is not null
drop table duplicate_all
GO
create table duplicate_all
(
c1 int,
c2 int,
c3 varchar(100)
)
GO
insert into duplicate_all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 1,100,'aaa' union all
select 2,200,'bbb' union all
select 3,300,'ccc' union all
select 4,400,'ddd' union all
select 5,500,'eee'
GO
(1) 利用临时此表
利用DISTINCT受益单条记录,封禁源数据,然后导回不重复记录。
如果列于并不大的话,可以把所有记录假设一次,然后truncate表后再导回,这样可以避开delete的日志操纵。
遗传物质代码 代码如下:
if OBJECT_ID('tempdb..#tmp') is not null
drop table #tmp
GO
select distinct * into #tmp
from duplicate_all
where c1 = 1
GO
delete duplicate_all where c1 = 1
GO
insert into duplicate_all
select * from #tmp
(2) 用于ROW_NUMBER
遗传物质代码 代码如下:
with tmp
as
(
select *,ROW_NUMBER() OVER(PARTITION BY c1,c2,c3 ORDER BY(getdate())) as num
from duplicate_all
where c1 = 1
)
delete tmp where num > 1
如果多个表有显然重复的行,可以考虑到通过UNION将多个此表牵头,吊到一个新的同结构的此表,SQL Server就会尽力省略表和注记之间的重复行。
二. 封禁部分重复的记录
部分至多重复的数据,通常表上是有主键的,也许是程序逻辑造成了多行数据列值的重复。
测试数据:
副本代码 代码如下:
if OBJECT_ID('duplicate_col') is not null
drop table duplicate_col
GO
create table duplicate_col
(
c1 int primary key,
c2 int,
c3 varchar(100)
)
GO
insert into duplicate_col
select 1,100,'aaa' union all
select 2,100,'aaa' union all
select 3,100,'aaa' union all
select 4,100,'aaa' union all
select 5,500,'eee'
GO
(1) 唯一索引
唯一索引有个忽略重复增建的选项,在成立主键约束/唯一键约束时都可以常用这个索引选项。
脱氧核糖核酸代码 代码如下:
if OBJECT_ID('tmp') is not null
drop table tmp
GO
create table tmp
(
c1 int,
c2 int,
c3 varchar(100),
constraint UQ_01 unique(c2,c3) with(IGNORE_DUP_KEY = ON)
)
GO
insert into tmp
select * from duplicate_col
select * from tmp
(2) 倚靠主键/唯一键来移除
通常可能会可选择主键/唯一键的最主要/最小值延续,其他行删掉。以下只沿用重复记录中c1很小的行。
拷贝代码 代码如下:
delete from duplicate_col
where exists(select 1 from duplicate_col b where duplicate_col.c1 > b.c1 and (duplicate_col.c2 = b.c2 and duplicate_col.c3 = b.c3))
--或者
副本代码 代码如下:
delete from duplicate_col
where c1 not in (select min(c1) from duplicate_col group by c2,c3)
如果要原有重复记录中的第N行,可以参阅05.先取分组中的某几行。
(3) ROW_NUMBER
和删去基本上重复记录的拼法基本一样。
脱氧核糖核酸代码 代码如下:
with tmp
as
(
select *,ROW_NUMBER() OVER(PARTITION BY c2,c3 ORDER BY(getdate())) as num
from duplicate_col
)
delete tmp where num > 1
select * from duplicate_col
SQL截图重复数据只保存一条 (下面的代码,很多网友种系统错误,大家多测试)
用SQL语句,删减掉重复项只保有一条
在几千条记录里,实际上着些相同的记录,如何并用SQL语句,写入掉重复的呢
1、URL表中多余的重复记录,重复记录是根据单个字段(peopleId)来判别
select * from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
2、移除表中多余的重复记录,重复记录是根据单个字段(peopleId)来推论,只尚有rowid很小的记录
delete from people
where peopleName in (select peopleName from people group by peopleNamehaving count(peopleName) > 1)
and peopleId not in (select min(peopleId) from people group by peopleNamehaving count(peopleName)>1)
3、排序表中多余的重复记录(多个字段)
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
4、删掉表中多余的重复记录(多个字段),只遗留下rowid很小的记录
delete from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
5、加载表中多余的重复记录(多个字段),不举例来说rowid大于的记录
select * from vitae a
where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
6.避免一个字段的左边的第一位:
update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'
7.补救一个字段的右边的第一位:
update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'
8.Pardosa删掉表中多余的重复记录(多个字段),不包涵rowid很小的记录
update vitae set ispass=-1
where peopleId in (select peopleId from vitae group by peopleId