SQL Server利用HashKey计算列解决宽字段查询的性能问题

2016-10-18 3739

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

RDS SQL Server Serverless，2-4RCU 50GB 3个月

云数据库 RDS SQL Server，基础系列 2核4GB

简介： #SQL Server利用HashKey计算列解决宽字段查询的性能问题 ##主人翁本文主人翁：MSSQL菜鸟和MSSQL老鸟。 ##问题提出某年某月某日，某MSSQL菜鸟满脸愁容的跑到老鸟跟前，心灰意懒的对老鸟说“我最近遇到一个问题，很大的问题，对，非常大的问题”。老鸟不急不慢的

SQL Server利用HashKey计算列解决宽字段查询的性能问题

主人翁

本文主人翁：MSSQL菜鸟和MSSQL老鸟。

问题提出

某年某月某日，某MSSQL菜鸟满脸愁容的跑到老鸟跟前，心灰意懒的对老鸟说“我最近遇到一个问题，很大的问题，对，非常大的问题”。老鸟不急不慢的推了推2000度超级近视眼镜框，慢吞吞的说：“说来听听”。

“我有一个100万数据量的表，有一个宽度为7500字段，不幸的是现在我需要根据这个字段的值来查询表数据，而且最为可恨的是MSSQL Server不允许我在这个字段上建立Index，所以，我的查询语句爆慢，应用程序直接超时，肿么办呀，肿么办？”。

问题分析

老鸟一听，捋了捋一身上老毛，头头是道的分析说：“查询慢，是正常的，快起来才不正常呢。你想想啊，字段宽度为7500，显然这个字段不能创建索引了，因为MSSQL限制创建索引的条件是键值宽度不超过900byte，100万的数据量没有索引的查询跑起来IO立马上起来了，性能瓶颈是理所应当的。”

“那要怎么解决啊？”，菜鸟已经心急如焚了。

老鸟接着问：“你知道Hash Join的原理吗？Hash Join就是将两个表的连接字段先算出Hash值，然后再利用Hash值来做连接操作的，对吧？”

“我知道Hash Join的原理啊，和解决这个问题有什么关系？”，菜鸟已经迫不及待了。

“我们完全可以借用这个思想嘛，我们可以先建立一个计算列，这个计算列存储着宽字段的Hash值，然后在这个Hash值上面建立索引。在查询的时候，我们直接使用Hash来检索满足条件的记录，换句话讲，只要Hash值满足条件，能够匹配上，对应的宽字段也就满足条件了嘛。”，老鸟像教育孩子似的教育着菜鸟。

“喔~~？哦~？”，菜鸟还是似懂非懂。老鸟看出了菜鸟的心思，于是得意洋洋的说：“来来来，让我们一起来看看Demo吧”。

解决问题

于是老鸟洋洋洒洒的写了一段测试Demo:

创建测试表

use tempdb
go

--Create Test table
if OBJECT_ID('dbo.test_for_hashkey','U') is not null
    drop table dbo.test_for_hashkey
GO
create table dbo.test_for_hashkey
(
    id int identity(1,1) primary key
    ,SearchKeyword varchar(7500) null
);
/*
We can't create index on the column SearchKeyword since the maximum key length has 900 bytes limitation.

create index ix_DBA_SearchKeyword
ON dbo.test_for_hashkey(SearchKeyword);
GO
*/

初始化100万条数据

--1 million records data init
SET NOCOUNT ON
declare
    @loop int
    ,@do int
    ,@SearchKeyword varchar(7500)
;

select
    @loop = 1000000
    ,@do = 0
;

while @do < @loop
begin
    set
        @SearchKeyword = REPLICATE(newid(),220)
    ;
    insert into dbo.test_for_hashkey
    select @SearchKeyword
    ;
    set @do = @do + 1
end
go

菜鸟的查询方法性能

--performance testing at the very first time for the regular query
declare
    @SearchKeyword varchar(7500)
;
select TOP 1
    @SearchKeyword = SearchKeyword
FROM dbo.test_for_hashkey WITH(NOLOCK)
where id = 59987;

SET STATISTICS TIME ON
SET STATISTICS IO ON
select *
FROM dbo.test_for_hashkey WITH(NOLOCK) 
where SearchKeyword = @SearchKeyword
;

/* cold cache
Table 'test_for_hashkey'. Scan count 5, logical reads 1003732, physical reads 6792, read-ahead reads 987055, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2870 ms,  elapsed time = 6213 ms.
*/

从注释部分的性能指标来看，菜鸟的查询方法性能的确如老鸟所说，IO消耗非常严重，逻辑读达到了100万，物理读达到了6792；时间CPU 2870毫秒和时间消耗6213毫秒还不算太严重（因为我的测试环境是SSD的存储介质）。
老鸟的优化方案：先添加计算列，记得为计算列使用PERSISTED关键字，然后在计算列上创建索引。

--and now, it's time for us to do something for booting the query
ALTER TABLE dbo.test_for_hashkey
ADD SearchKeyword_hashkey AS checksum(SearchKeyword) PERSISTED
;
GO
CREATE INDEX IX_SearchKeyword_hashkey ON dbo.test_for_hashkey(SearchKeyword_hashkey);
GO

检验老鸟优化方案

--test again to observe the performance metrics
declare
    @SearchKeyword varchar(7500)
    , @SearchKeyword_hashkey int
    ;
select TOP 1
    @SearchKeyword_hashkey = CHECKSUM(SearchKeyword)
    , @SearchKeyword = SearchKeyword
FROM dbo.test_for_hashkey WITH(NOLOCK)
where id = 59987;

select *
FROM dbo.test_for_hashkey WITH(NOLOCK) 
where SearchKeyword_hashkey = @SearchKeyword_hashkey
--to avoid hash key collisions, we'd better add this condition statement
and SearchKeyword = @SearchKeyword
;
/*
Table 'test_for_hashkey'. Scan count 1, logical reads 7, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

*/

从注释部分的性能指标来看，老鸟的优化方案的确棒棒的，逻辑读降低到7，物理读降低都1；CPU和执行时间消耗均为0毫秒，也就是秒杀，性能取得了质的飞跃。

同时，从老鸟优化方案的执行计划来看，的确走到了这个有效的索引上来:
Hash01

注意事项

看完优化效果后，菜鸟已经激动得不能自已：“牛X，老鸟就是老鸟，请收下我的膝盖吧，今生今世为你做牛做马”。

老鸟摸了摸菜鸟脑袋，语重心长的说：“千万不要高兴得太早，这个方法虽然效果很棒，但是有两个需要注意的点”。

一、为了防止Hash碰撞，我们最好在WHERE语句中加上防止Hash碰撞的代码

--to avoid hash key collisions, we'd better add this condition statement
and SearchKeyword = @SearchKeyword

二、这个方法只适合于字符串全部匹配的情况，对应字符串部分模糊和全部模糊匹配并不适合。

SQL Server利用HashKey计算列解决宽字段查询的性能问题

SQL Server利用HashKey计算列解决宽字段查询的性能问题

主人翁

问题提出

问题分析

解决问题

注意事项

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

SQL Server利用HashKey计算列解决宽字段查询的性能问题

SQL Server利用HashKey计算列解决宽字段查询的性能问题

主人翁

问题提出

问题分析

解决问题

注意事项

热门文章

最新文章

相关课程

相关电子书

相关实验场景