[翻译]:SQL死锁-阻塞探测

简介: 原文:[翻译]:SQL死锁-阻塞探测到了这篇,才是真正动手解决问题的时候,有了死锁之后就要分析死锁的原因,具体就是需要定位到具体的SQL语句上。那么如何发现产生死锁的问题本质呢?下面这篇讲的非常细了,还提到了不少实用的SQL,但对我个人来讲,前半部分基本就够用,可以指出死锁的原因,至于后面那些有兴趣可以多研究研究。
原文: [翻译]:SQL死锁-阻塞探测

到了这篇,才是真正动手解决问题的时候,有了死锁之后就要分析死锁的原因,具体就是需要定位到具体的SQL语句上。那么如何发现产生死锁的问题本质呢?下面这篇讲的非常细了,还提到了不少实用的SQL,但对我个人来讲,前半部分基本就够用,可以指出死锁的原因,至于后面那些有兴趣可以多研究研究。

As we already know, usually blocking happens due non-optimized queries. But how to detect queries that need to be optimized? Of course, in case of severe issues, we can analyze the queries that timeouts on the client side, but it gives us only the worst cases – with default settings it would show the queries that ran for more than 30 seconds.

现在我们已经知道阻塞是因为未经过优化的查询造成的,但是如何去探测我们有哪些语句是需要优化的呢?当然,在非常严重的情况下,我们可以分析在客户端执行超时的查询,但它只提醒在默认设置下的超时。

So, what can we do? Keeping various monitoring software out of scope, there are 2 simple methods to do the troubleshooting. First one is Blocking Process Report and second one is DMV. Let’s start with the first one.

我们能做什么呢?这里有两种解决方法:第一是Blocking process report ,第二是利用DMV。我们先看简单点的Blocking Process report。

注:我个人也比较喜欢这个,没有专业的SQL语句,提供图形化的操作界面。

Blocking process report provides you nice XML with information about sessions involved in the blocking. But first, you have to set Blocked Process Threshold with the following commands:

要查看Blocking process report呢,我们需要有一个的权限,需要拥有Sysadmin的权限,然后再设置一些系统配置。Blocking process report提供了非常友好的信息,里面包含有和阻塞相关的会话信息。


EXECUTE sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
EXECUTE sp_configure 'blocked process threshold', 5 -- in seconds.
GO
RECONFIGURE
GO
EXECUTE sp_configure 'show advanced options', 0
GO
RECONFIGURE
GO

Of course, you should think what threshold you want to use and balance between a lot or very few events reported. I’d suggest to start with threshold of 5 seconds – if you have processes that are blocked for more than 5 seconds there is the good chance that deadlock monitor is constantly running. There are 2 things to keep in mind. First – SQL does the good job trying to monitor the blocking and generate blocked processes report, but it does not guarantee that it will do it all the time. E.g. it does the check when it has available resources to do so. Secondary, if you have process blocked for a long time, SQL generates report on every try. I.e. if you have threshold equal to 5 seconds, you will get the different reports for the same blocked condition after 5 seconds, 10 seconds and so on.

 

After you set that threshold, the next step is to run SQL Server Profiler and setup the trace with only 1 event: “Blocked process report”. You can do it in UI:

But the better way to run server side trace. You can export trace definition and run the script to do that.

I’m not going to focus on the details how to set it up – you can find information in Books Online. So let’s see what information that trace produces. Let’s run the blocking script from my previous post. In one session let’s run the update in uncommitted transaction (X lock on the row):

In another session let’s run select that introduces table scan:

Blocked process trace generates the report with following XML.

It has 2 elements: blocked-process and blocking-process. Simple case has just 2 processes involved but in real life it could be more complicated – Process A can block Process B and same time be blocked by Process C. That leads to the multiple events in the trace.

Let’s take a quick look at the details. First of all, you can see the status (red lines). Blocked process has the status of “Suspended”. Blocking process status is more interesting. “Sleeping” status indicates that process is waiting for the next command. Most likely it’s the sign of incorrect transaction handling on the client. Think about situation when client starts transaction when user opens the form, update data when user does some changes and commit or rollback it when user clicks on SAVE or CANCEL button.

Next, blocking report shows quite a few details about processes itself (Green lines). SPID, login, host, client app, etc. It also includes the information (Blue lines) about transaction isolation level, lock mode requested as well as point to resource that is locked. If you need, you can trace it down to specific row although I found that it’s rarely needed.

But most interesting are execution stacks and buffers (Pink blocks). For Ad-hoc sql you can see actual SQL involved in the locking. The situation is a little bit more complicated when stored procedures are involved. Let’s take a look. First, let’s create the simple procedure that replaces this update statement:

Next, let’s run this SP and select statement again. Make sure you commit or rollback transaction from the previous example first. Here is the blocking report:

As you can see in the blocking report, it does not show you what statement caused the blocking. In order to get it, you can use SQLHandle from the top statement in the execution stack and sys.dm_exec_sql_text data management function. Let’s see that:

You can either count the line based on the trace value or find the actual statement based on offsets. It worth to mention that this could return different statement that the one that acquired the lock.

That method has one major drawback – statement needs to be in the cache in order at the time when you call sys.dm_exec_sql_text function. Alternative and better method to obtain blocked process report is using extended events (SQL Server 2008+). While it returns the same data, event is triggered at the time of the blocking, so you can analyze/query the system at the time of the blocking. But it’s far more complex to setup and out of the scope for now.

After you find the statements involved in the blocking, you can analyze while blocking occurs. As I mentioned before, the chance is that there are scans involved. You can use management studio and see execution plan there. Alternatively I found that those queries are quite often become one of the biggest IO consumers in the system, so you’ll see them if you analyze sys.dm_query_exec_stats data management view. You can use the script below. If query is there – just click on the plan and you’re all set.


SELECT TOP 50
SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((
CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qp.query_plan,
qs.execution_count,
(qs.total_logical_reads + qs.total_logical_writes) / qs.execution_count as [Avg IO],
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000 total_elapsed_time_in_ms,
qs.last_elapsed_time/1000 last_elapsed_time_in_ms,
qs.last_execution_time

FROM
sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY
[Avg IO] DESC

Blocking process report is extremely useful when you need to collect information about blocking. Although, if you need to look what happens in the system right now, you can use sys.dm_tran_locks data management view. I’ll show you 2 scripts below.

The first one gives you the list of the locks system has right now:

select
TL1.resource_type
,DB_NAME(TL1.resource_database_id) as [DB Name]
,CASE TL1.resource_type
WHEN 'OBJECT' THEN OBJECT_NAME(TL1.resource_associated_entity_id, TL1.resource_database_id)
WHEN 'DATABASE' THEN 'DB'
ELSE
CASE
WHEN TL1.resource_database_id = DB_ID()
THEN
(
select OBJECT_NAME(object_id, TL1.resource_database_id)
from sys.partitions
where hobt_id = TL1.resource_associated_entity_id
)
ELSE
'(Run under DB context)'
END
END as ObjectName
,TL1.resource_description
,TL1.request_session_id
,TL1.request_mode
,TL1.request_status
,WT.wait_duration_ms as [Wait Duration (ms)]
,(
select
SUBSTRING(
S.Text,
(ER.statement_start_offset / 2) + 1,
((
CASE
ER.statement_end_offset
WHEN -1
THEN DATALENGTH(S.text)
ELSE ER.statement_end_offset
END - ER.statement_start_offset) / 2) + 1)
from
sys.dm_exec_requests ER
cross apply sys.dm_exec_sql_text(ER.sql_handle) S
where
TL1.request_session_id = ER.session_id
) as [Query]
from
sys.dm_tran_locks as TL1 left outer join sys.dm_os_waiting_tasks WT on
TL1.lock_owner_address = WT.resource_address and TL1.request_status = 'WAIT'
where
TL1.request_session_id <> @@SPID
order by
TL1.request_session_id

The second one is slightly modified version that shows you only blocking and blocked processes.


/*
Shows blocked and blocking processes. Even if it works across all database, ObjectName
populates for current database only. Could be modified with dynamic SQL if needed

Be careful with Query text for BLOCKING session. This represents currently active
request for this specific session id which could be different than query which produced locks
It also could be NULL if there are no active requests for this session
*/
select
TL1.resource_type
,DB_NAME(TL1.resource_database_id) as [DB Name]
,CASE TL1.resource_type
WHEN 'OBJECT' THEN OBJECT_NAME(TL1.resource_associated_entity_id, TL1.resource_database_id)
WHEN 'DATABASE' THEN 'DB'
ELSE
CASE
WHEN TL1.resource_database_id = DB_ID()
THEN
(
select OBJECT_NAME(object_id, TL1.resource_database_id)
from sys.partitions
where hobt_id = TL1.resource_associated_entity_id
)
ELSE
'(Run under DB context)'
END
END as ObjectName
,TL1.resource_description
,TL1.request_session_id
,TL1.request_mode
,TL1.request_status
,WT.wait_duration_ms as [Wait Duration (ms)]
,(
select
SUBSTRING(
S.Text,
(ER.statement_start_offset / 2) + 1,
((
CASE
ER.statement_end_offset
WHEN -1
THEN DATALENGTH(S.text)
ELSE ER.statement_end_offset
END - ER.statement_start_offset) / 2) + 1)
from
sys.dm_exec_requests ER
cross apply sys.dm_exec_sql_text(ER.sql_handle) S
where
TL1.request_session_id = ER.session_id
) as [Query]
from
sys.dm_tran_locks as TL1 join sys.dm_tran_locks TL2 on
TL1.resource_associated_entity_id = TL2.resource_associated_entity_id
left outer join sys.dm_os_waiting_tasks WT on
TL1.lock_owner_address = WT.resource_address and TL1.request_status = 'WAIT'

where
TL1.request_status <> TL2.request_status and
(
TL1.resource_description = TL2.resource_description OR
(TL1.resource_description is null and TL2.resource_description is null)
)

So those are 2 techniques that can help you with the troubleshooting. Again, I think it’s not as powerful as extended events approach but same time those are much simpler to accomplish.

目录
相关文章
|
SQL 存储
在 SQL Server 中查找活动连接和死锁
在SQL Server中有几种方法可以找到活动的 SQL 连接。让我们看看一些使用 T-SQL 查询的简单快捷的方法。
175 0
|
SQL 数据库
31 张图 | 一文搞懂 SQL事务+隔离级别+阻塞+死锁
31 张图 | 一文搞懂 SQL事务+隔离级别+阻塞+死锁
201 0
31 张图 | 一文搞懂 SQL事务+隔离级别+阻塞+死锁
|
SQL Oracle NoSQL
JDK的sql设计不合理导致的驱动类初始化死锁问题
JDK的sql设计不合理导致的驱动类初始化死锁问题
JDK的sql设计不合理导致的驱动类初始化死锁问题
|
SQL 存储 消息中间件
一个线上SQL死锁异常分析:深入了解事务和锁
引发死锁的原因是什么?如何避免?本文详细介绍了和死锁有关的知识点,通过深入分析MySQL事务和锁的机制,结合案例背景,找到了问题的所在,并梳理了解决方案,详解其原理。希望对同学们有所启发。
一个线上SQL死锁异常分析:深入了解事务和锁
|
SQL 数据库
PCB SQL SERVER 数据库阻塞进程关系以思维导图方式呈现的实现方法
PCB SQL SERVER 数据库阻塞进程关系以思维导图方式呈现的实现方法 最近公司数据库同步机制常发生阻塞,时不时的导致PCB工程系统卡死现象,只有找到阻塞源头并处理掉,才以消除阻塞,但数据库中查看会话阻塞是通过二维表方式展示的父子会话进程ID的,是很难清楚的展示各会话进程ID的关系图的。
787 0
|
SQL 测试技术 C#
SQL Server死锁中的会话隔离级别为序列化(Serializable)实验测试
SQL Server死锁中的会话隔离级别为序列化(Serializable)实验测试最近在分析SQL Server的死锁时,发现一个比较有意思的现象,发现死锁当中一个会话的隔离级别为序列化(Serializable),这个是让人比较奇怪的地方,我们知道SQL Server数据库的默认隔离级别为已提交读(READ COMMITTED),除非人为设置事务隔离级别(TRANSACTION ISOLATION LEVEL),否则事务隔离级别会使用数据库的默认隔离级别。
21741 0
|
SQL 数据格式 XML
SQL SERVER 查看近期死锁
SQL SERVER 查看近期死锁在项目运行的过程中,死锁不可能完全避免,但要尽可能减少死锁的出现, 产生死锁的原因主要是: 1,系统资源不足。 2,进程运行推进的顺序不合适。 3,资源分配不当等。 产生死锁的四个必要条件: 互斥条件:一个资源每次只能被一个进程使用,即在一段时间内某资源仅为一个进程所占有。
2344 0
|
SQL 监控 Go
SQL Server 死锁的告警监控
原文:SQL Server 死锁的告警监控 今天这篇文章总结一下如何监控SQL Server的死锁,其实以前写过MS SQL 监控错误日志的告警信息,这篇文章着重介绍如何监控数据库的死锁,当然这篇文章不分析死锁产生的原因、以及如何解决死锁。
1080 0
|
SQL 调度
SQL Server中SELECT会真的阻塞SELECT吗?
原文:SQL Server中SELECT会真的阻塞SELECT吗? 在SQL Server中,我们知道一个SELECT语句执行过程中只会申请一些意向共享锁(IS) 与共享锁(S), 例如我使用SQL Profile跟踪会话86执行SELECT * FROM dbo.TEST WHERE OBJECT_ID =1 这个查询语句,其申请、释放的锁资源的过程如下所示:     而且从最常见的锁模式的兼容性表,我们可以看到IS锁与S锁都是兼容的,也就是说SELECT查询是不会阻塞SELECT查询的。
1083 0
|
SQL 存储 Oracle
为什么数据库有时候不能定位阻塞(Blocker)源头的SQL语句
原文:为什么数据库有时候不能定位阻塞(Blocker)源头的SQL语句     在SQL Server数据库或OACLE数据库当中,通常一个会话持有某个资源的锁,而另一个会话在请求这个资源,就会出现阻塞(blocking)。
998 0