MSSQL - 应用案例 - Event Notification + Service Broker构建死锁自动收集系统

本文涉及的产品
云数据库 RDS SQL Server,独享型 2核4GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介: --- title: MSSQL - 应用案例 - Event Notification + Service Broker构建死锁自动收集系统 author: 风移 --- # 摘要 这篇文章介绍SQL Server的一个典型的应用案例,即如何利用Event Notification与Service Broker技术相结合来实现死锁信息自动收集系统。通过这个系统,我们可以全面把控SQL

title: MSSQL - 应用案例 - Event Notification + Service Broker构建死锁自动收集系统

author: 风移

摘要

这篇文章介绍SQL Server的一个典型的应用案例,即如何利用Event Notification与Service Broker技术相结合来实现死锁信息自动收集系统。通过这个系统,我们可以全面把控SQL Server数据库环境中所有实例上发生的死锁详细信息,供我们后期分析和解决死锁场景。

死锁自动收集系统需求分析

当 SQL Server 中某组资源的两个或多个线程或进程之间存在循环的依赖关系时,但因互相申请被其他进程所占用,而不会释放的资源处于的一种永久等待状态,将会发生死锁。SQL Server服务自动死锁检查进程默认每5分钟跑一次,当死锁发生时,会选择一个代价较小的进程做为死锁牺牲品,以此来避免死锁导致更大范围的影响。被选择做为死锁牺牲品的进程会报告如下错误:

Msg 1205, Level 13, State 51, Line 8
Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

如果进程间发生了死锁,对于用户业务系统,乃至整个SQL Server服务健康状况影响很大,轻者系统反应缓慢,服务假死;重者服务挂起,拒绝请求。那么,我们有没有一种方法可以完全自动、无人工干预的方式异步收集SQL Server系统死锁信息并远程保留死锁相关信息呢?这些信息包括但不仅限于:
死锁发生在哪些进程之间
各个进程执行的语句块是什么?死锁时,各个进程在执行哪条语句?
死锁的资源是什么?死锁发生在哪个数据库?哪张表?哪个数据页?哪个索引上?
死锁发生的具体时间点,包含语句块开始时间、语句执行时间等
用户进程使用的登录用户是什么?客户端驱动是什么?
......
如此的无人值守的自动死锁收集系统,就是我们今天要介绍的应用案例分享:利用SQL Server的Event Notification与Service Broker建立自动死锁信息收集系统。

Service Broker和Event Notification简介

在死锁自动收集系统介绍开始之前,先简要介绍下SQL Server Service Broker和Event Notification技术。

Service Broker简介

Service Broker是微软至SQL Server 2005开始集成到数据库引擎中的消息通讯组件,为 SQL Server提供队列和可靠的消息传递的能力,可以用来构建基于异步消息通讯为基础的应用程序。Service Broker既可用于单个 SQL Server 实例的应用程序,也可用于在多个实例间进行消息分发工作的应用程序。Service Broker使用TCP/IP端口在实例间交换消息,所包含的功能有助于防止未经授权的网络访问,并可以对通过网络发送的消息进行加密,以此来保证数据安全性。多实例之间使用Service Broker进行异步消息通讯的结构图如下所示(图片来自微软的官方文档):
01.png

Event Notification简介

Event Notification的中文名称叫事件通知,执行事件通知可对各种Transact-SQL数据定义语言(DDL)语句和SQL跟踪事件做出响应,采取的响应方式是将这些事件的相关信息发送到 Service Broker 服务。事件通知可以用来执行以下操作:
记录和检索发生在数据库上的更改或活动。
执行操作以异步方式而不是同步方式响应事件。
可以将事件通知用作替代DDL 触发器和SQL跟踪的编程方法。事件通知的信息媒介是以xml数据类型的信息传递给Service Broker服务,它提供了有关事件的发生时间、受影响的数据库对象、涉及的 Transact-SQL 批处理语句等详细信息。对于SQL Server死锁而言,可以使用Event Notification来跟踪死锁事件,来获取DEADLOCK_GRAPH XML信息,然后通过异步消息组件Service Broker发送到远端的Deadlock Center上的Service Broker队列,完成死锁信息收集到死锁中央服务。

死锁收集系统架构图

在介绍完Service Broker和Event Notification以后,我们来看看死锁手机系统的整体架构图。在这个系统中,存在两种类型角色:我们定义为死锁客户端(Deadlock Client)和死锁中央服务(Deadlock Center)。死锁客户端发生死锁后,首先会将Deadlock Graph XML通过Service Broker发送给死锁中央服务,死锁中央服务获取到Service Broker消息以后,解析这个XML就可以拿到客户端的死锁相关信息,最后存放到本地日志表中,供终端客户查询和分析使用。最终的死锁收集系统架构图如下所示:
02.png

详细的死锁信息收集过程介绍如下:死锁客户端通过本地SQL Server的Event Notification捕获发生在该实例上的Deadlock事件,并在死锁发生以后将Deadlock Graph XML数据存放到Event Notification绑定的队列中,然后通过绑定在该队列上的存储过程自动触发将Deadlock Graph XML通过Service Broker异步消息通讯的方式发送到死锁中央服务。中央服务在接收到Service Broker消息以后,首先放入Deadlock Center Service Broker队列中,该队列绑定了消息自动处理存储过程,用来解析Deadlock Graph XML信息,并将死锁相关的详细信息存入到Deadlock Center的Log Table中。最后,终端用户可以直接对Log Table来查询和分析所有Deadlock Client上发生的死锁信息。通过这系列的过程,最终达到了死锁信息的自动远程存储、收集,以提供后期死锁场景还原和复盘,达到死锁信息可追溯,及时监控,及时发现的目的。

Service Broker配置

系统架构设计完毕后,接下来是系统的配置和搭建过程,首先看看Service Broker的配置。这个配置还是相对比较繁琐的,包含了以下步骤:
创建Service Broker数据库(假设数据库名为DDLCenter)并开启Service Broker选项
创建Service Broker队列的激活存储过程和相关表对象
创建Master数据库下的Master Key
创建传输层本地和远程证书
创建基于证书的用户登录
创建Service Broker端口并授权用户连接
创建DDLCenter数据库下的Master Key
创建会话层本地及远程证书
创建Service Broker组件所需要的对象,包括:Message Type、Contact、Queue、Service、Remote Service Binding、Route

Deadlock Client Server

以下的配置请在Deadlock Client SQL Server实例上操作。
创建DDLCenter数据库并开启Service Broker选项

-- Run script on client server to gather deadlock graph xml
USE master
GO
-- Create Database
IF DB_ID('DDLCenter') IS NULL
    CREATE DATABASE [DDLCenter];
GO
-- Change datbase to simple recovery model
ALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAIT
GO
-- Enable Service Broker
ALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ON
GO
-- Change database Owner to sa
ALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]
GO

三个表和两个存储过程
表[DDLCollector].[Deadlock_Traced_Records]:从Event Notification队里接收的消息会记录到该表中。
表[DDLCollector].[Send_Records]:Deadlock Client成功发送Service Broker消息记录
表[DDLCollector].[Error_Records]:记录发生异常情况时的信息。
存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]:Deadlock Client绑定到队里的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。
存储过程[DDLCollector].[UP_SendDeadlockMsg]:Deadlock Client发送异步消息给Deadlock Center,这个存储过程会被上面的激活存储过程调用。

-- Run on Client Instance
USE [DDLCenter]
GO
-- Create Schema
IF NOT EXISTS(
    SELECT TOP 1 *
    FROM sys.schemas
    WHERE name = 'DDLCollector'
)
BEGIN
    EXEC('CREATE SCHEMA DDLCollector');
END
GO

-- Create table to log Traced Deadlock Records
IF OBJECT_ID('DDLCollector.Deadlock_Traced_Records', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Deadlock_Traced_Records]
GO

CREATE TABLE [DDLCollector].[Deadlock_Traced_Records](
    [RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
    [Processed_Msg] [xml] NULL,
    [Processed_Msg_CheckSum] INT,
    [Record_Time] [datetime] NOT NULL 
        CONSTRAINT DF_Deadlock_Traced_Records_Record_Time DEFAULT(GETDATE()),
    CONSTRAINT PK_Deadlock_Traced_Records_RowId PRIMARY KEY
    (RowId ASC)
) ON [PRIMARY]
GO

-- Create table to record deadlock graph xml sent successfully log
IF OBJECT_ID('DDLCollector.Send_Records', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Send_Records]
GO

CREATE TABLE [DDLCollector].[Send_Records](
    [RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
    [Send_Msg] [xml] NULL,
    [Send_Msg_CheckSum] INT,
    [Record_Time] [datetime] NOT NULL 
        CONSTRAINT DF_Send_Records_Record_Time DEFAULT(GETDATE()),
    CONSTRAINT PK_Send_Records_RowId PRIMARY KEY
    (RowId ASC)
) ON [PRIMARY]
GO

-- Create table to record error info when exception occurs
IF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Error_Records]
GO

CREATE TABLE [DDLCollector].[Error_Records](
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Msg_Body] [xml] NULL,
    [Conversation_handle] [uniqueidentifier] NULL,
    [Message_Type] SYSNAME NULL,
    [Service_Name] SYSNAME NULL,
    [Contact_Name] SYSNAME NULL,
    [Record_Time] [datetime] NOT NULL
        CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()),
    [Error_Details] [nvarchar](4000) NULL,
    CONSTRAINT PK_Error_Records_RowId PRIMARY KEY
    (RowId ASC)
) ON [PRIMARY]
GO


USE [DDLCenter]
GO

-- Create Store Procedure to Send Deadlock Graph xml to Center Server
IF OBJECT_ID('DDLCollector.UP_SendDeadlockMsg', 'P') IS NOT NULL
    DROP PROC [DDLCollector].[UP_SendDeadlockMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_SendDeadlockMsg](
    @DeadlockMsg XML
)
AS  
BEGIN      
    SET NOCOUNT ON; 

    DECLARE 
        @handle UNIQUEIDENTIFIER
        ,@Proc_Name SYSNAME
        ,@Error_Details VARCHAR(2000)
    ;

    -- get the store procedure name
    SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
    ;
    
    BEGIN TRY
        
        -- Begin Dialog
        BEGIN DIALOG CONVERSATION @handle
        FROM SERVICE [http://soa/deadlock/service/ClientService]
        TO Service 'http://soa/deadlock/service/CenterService'
        ON CONTRACT [http://soa/deadlock/contract/CheckContract]
        ;

        -- Send deadlock graph xml as the message to Center Server
        SEND ON CONVERSATION @handle
        MESSAGE TYPE [http://soa/deadlock/MsgType/Request](@DeadlockMsg);

        -- Log it successfully
        INSERT INTO [DDLCollector].[Send_Records]([Send_Msg], [Send_Msg_CheckSum])   
        VALUES( @DeadlockMsg, CHECKSUM(CAST(@DeadlockMsg as NVARCHAR(MAX))))
    END TRY
    BEGIN CATCH
        
        -- Record the error info when exception occurs
        SET   @Error_Details=
                ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
                ' Error Message : ' + ERROR_MESSAGE() +
                ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
                ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
                ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) +
                ' Exception Proc: ' + @Proc_Name
        ;    
        
        -- record into table
        INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
        VALUES(@DeadlockMsg, @handle, 'http://soa/deadlock/MsgType/Request', 'http://soa/deadlock/service/ClientService', 'http://soa/deadlock/contract/CheckContract', @Error_Details); 

    END CATCH
END
GO

-- Create Store Procedure for Queue: when extend event notification queue message
-- this store procedure will be called.
IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockEventMsg', 'P') IS NOT NULL
    DROP PROC [DDLCollector].[UP_ProcessDeadlockEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockEventMsg]
AS
/*

SELECT * FROM [DDLCollector].[Deadlock_Traced_Records]
SELECT * FROM [DDLCollector].[Send_Records]

SELECT * FROM [DDLCollector].[Error_Records]

*/
BEGIN      
    SET NOCOUNT ON;   
    DECLARE 
        @handle UNIQUEIDENTIFIER
        , @Message_Type SYSNAME
        , @Service_Name SYSNAME
        , @Contact_Name SYSNAME
        , @Error_Details VARCHAR(2000)
        , @Message_Body XML
        , @Proc_Name SYSNAME
    ;

    -- Store Procedure Name
    SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
    ;

    BEGIN TRY
    
    -- Receive message from queue
    WAITFOR(RECEIVE TOP(1)        
            @handle = conversation_handle
            , @Message_Type = message_type_name
            , @Service_Name = service_name
            , @Contact_Name = service_contract_name
            , @Message_Body = message_body        
            FROM dbo.[http://soa/deadlock/queue/ClientQueue]),Timeout 500
    ;
    
    -- just return if there is no message needed to process        
    IF(@@Rowcount=0)      
        BEGIN   
            RETURN        
        END
    -- Get data from message queue
    ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification'      
        BEGIN               
            -- Record message log first
            INSERT INTO  [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum])         
            VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX))))
        
            -- BE NOTED HERE: PLEASE DO'T END CONVERSATION, OR ELSE EXCEPTION WILL BE THROWN OUTPUT
            /*
            Error: 17001, Severity: 16, State: 1.
            Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{67419386-7C34-E711-A709-001C42099969}'. Error Code = '8429'.
            Error: 17005, Severity: 16, State: 1.
            Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active.  
            */
            --END CONVERSATION @handle

            --Here call another Store Procedure to send deadlock graph info to center server
            EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body;
        END
    --End Diaglog Message Type, that means we should end this conversation      
    ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'        
        BEGIN         
            END CONVERSATION @handle;     
        END
    -- Konwn Service Broker Errors by System.   
    ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'        
        BEGIN         
            END CONVERSATION @handle       
        
            INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
            VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name);               
        END       
    ELSE
        -- unknown Message Types.        
        BEGIN         
            END CONVERSATION @handle

            INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
            VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name);

            -- unexpected message type         
            RAISERROR (N' Received unknown message type: %s', 16, 1, @Message_Type) WITH LOG;        
        END      
    END TRY      
    BEGIN CATCH       
    BEGIN        
        SET   @Error_Details=
                ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
                ' Error Details : ' + ERROR_MESSAGE() +
                ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
                ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
                ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + 
                ' Exception Proc: ' + @Proc_Name
        ;    
        
        INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
        VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); 
    END      
    END CATCH  
END  
GO

创建Master库下Master Key

USE master
GO
-- If the master key is not available, create it. 
IF NOT EXISTS (SELECT * 
                FROM sys.symmetric_keys
                WHERE name LIKE '%MS_DatabaseMasterKey%') 
BEGIN
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ClientMasterKey*'; 
END 
GO

创建传输层本地证书并备份到本地文件系统
这里请注意证书的开始生效时间要略微早于当前时间,并设置合适的证书过期日期,我这里是设置的过期日期为9999年12月30号。

USE master
GO
-- Crete Transport Layer Certification
CREATE CERTIFICATE TrpCert_ClientLocal
AUTHORIZATION dbo
WITH SUBJECT = 'TrpCert_ClientLocal',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- then backup it up to local path
-- and after that copy it to Center server
BACKUP CERTIFICATE TrpCert_ClientLocal
TO FILE = 'C:\Temp\TrpCert_ClientLocal.cer';
GO

创建传输层远程证书
这里的证书是通过证书文件来创建的,这个证书文件来自于远程通讯的另一端Deadlock Center SQL Server的证书文件的一份拷贝。

USE master
GO
-- Create certification came from Center Server.
CREATE    CERTIFICATE TrpCert_RemoteCenter 
FROM FILE = 'C:\Temp\TrpCert_RemoteCenter.cer'
GO

创建基于证书文件的用户登录
这里也可以创建带密码的常规用户登录,但是为了规避安全风险,这里最好创建基于证书文件的用户登录。

USE master
GO
-- Create user login
IF NOT EXISTS(SELECT * 
            FROM sys.syslogins 
            WHERE name='SSBDbo')
BEGIN
    CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_ClientLocal;
END
GO

创建Service Broker TCP/IP通讯端口并授权用户连接权限
这里需要注意的是,端口授权的证书一定本地实例创建的证书,而不是来自于远程服务器的那个证书。比如代码中的AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal部分。

USE master
GO          
--Creaet Tcp endpoint for SSB comunication and grant connect to users.      
CREATE ENDPOINT EP_SSB_ClientLocal
STATE = STARTED 
AS TCP 
( 
     LISTENER_PORT = 4022 
) 
FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal,  ENCRYPTION = REQUIRED 
) 
GO 

-- Grant Connect on Endpoint to User SSBDbo
GRANT CONNECT ON ENDPOINT::EP_SSB_ClientLocal TO SSBDbo 
GO

创建DDLCenter数据库Master Key

-- Now, let's go inside to conversation database
USE DDLCenter
GO

-- Create Master Key
IF NOT EXISTS (SELECT * 
                FROM sys.symmetric_keys 
                WHERE name LIKE '%MS_DatabaseMasterKey%')
BEGIN        
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';
END
GO

创建会话层本地证书

USE DDLCenter
GO
-- Create conversation layer certification
CREATE CERTIFICATE DlgCert_ClientLocal
AUTHORIZATION dbo
WITH SUBJECT = 'DlgCert_ClientLocal',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- backup it up to local path
-- and then copy it to remote Center server
BACKUP CERTIFICATE DlgCert_ClientLocal
TO FILE = 'C:\Temp\DlgCert_ClientLocal.cer';
GO

创建DDLCenter用户,不需要和任何用户登录匹配

USE DDLCenter
GO
-- Create User for login under conversation database
IF NOT EXISTS(
    SELECT TOP 1 *
    FROM sys.database_principals
    WHERE name = 'SSBDbo'
)
BEGIN
    CREATE USER SSBDbo WITHOUT LOGIN;
END
GO

创建会话层远程证书,这个证书文件来自Deadlock Center SQL Server备份

USE DDLCenter
GO
-- Create converstaion layer certification came from remote Center server.
CREATE    CERTIFICATE DlgCert_RemoteCenter
AUTHORIZATION SSBDbo
FROM FILE='C:\Temp\DlgCert_RemoteCenter.cer'
GO

GRANT CONNECT TO SSBDbo;

创建Service Broker组件对象
Deadlock Client与Deadlock Center在创建Service Broker组件对象时存在差异:第一个差异是创建Service的时候,需要包含Event Notification的Contract,名称为
http://schemas.microsoft.com/SQL/Notifications/PostEventNotification;第二个差异是需要多创建一个指向本地服务的路由http://soa/deadlock/route/LocalRoute

USE DDLCenter
GO

-- Create Message Type
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request]
       VALIDATION = WELL_FORMED_XML;
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response]
       VALIDATION = WELL_FORMED_XML;
GO

-- Create Contact
CREATE CONTRACT [http://soa/deadlock/contract/CheckContract](
    [http://soa/deadlock/MsgType/Request] SENT BY INITIATOR,
    [http://soa/deadlock/MsgType/Response] SENT BY TARGET
);
GO

-- Create Queue
CREATE QUEUE dbo.[http://soa/deadlock/queue/ClientQueue] 
WITH STATUS = ON, RETENTION = OFF
, ACTIVATION (STATUS = ON , 
                PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockEventMsg] , 
                MAX_QUEUE_READERS = 2 , 
                EXECUTE AS N'dbo') 
GO

-- Create Service
-- Here is very import, we have to create service for both contacts
-- to get extend event notification and SSB work.
CREATE SERVICE [http://soa/deadlock/service/ClientService]
ON QUEUE [http://soa/deadlock/queue/ClientQueue]
(
  [http://soa/deadlock/contract/CheckContract],
  [http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]
);
GO

-- Grant Send on service
GRANT SEND ON SERVICE::[http://soa/deadlock/service/ClientService] to SSBDbo;
GO

-- Create Remote Service Bingding
CREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/CenterRSB]
TO SERVICE 'http://soa/deadlock/service/CenterService' 
WITH  USER = [SSBDbo],
ANONYMOUS=Off
GO

-- Create Route
CREATE ROUTE [http://soa/deadlock/route/CenterRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/CenterService',
ADDRESS = 'TCP://10.211.55.3:4024';
GO

-- Create route for the DeadlockNotificationSvc
CREATE ROUTE [http://soa/deadlock/route/LocalRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',
ADDRESS = 'LOCAL';
GO

Deadlock Center Server

创建DDLCenter数据库并开启Service Broker选项

-- Run script on center server to receive client deadlock xml
USE master
GO
-- Create Database
IF DB_ID('DDLCenter') IS NULL
    CREATE DATABASE [DDLCenter];
GO
-- Change datbase to simple recovery model
ALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAIT
GO
-- Enable Service Broker
ALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ON
GO
-- Change database Owner to sa
ALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]
GO

三张表和两个存储过程
表[DDLCollector].[Collect_Records]:Deadlock Center成功接收到的Service Broker消息。
表[DDLCollector].[Error_Records]:记录发生异常情况的详细信息。
表[DDLCollector].[Deadlock_Info]:记录所有Deadlock Client端发生的Deadlock详细信息。
存储过程[DDLCollector].[UP_ProcessDeadlockGraphEventMsg]:Deadlock Center上绑定到队列的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。
存储过程[DDLCollector].[UP_ParseDeadlockGraphEventMsg]:Deadlock Center上解析Deadlock Graph XML的存储过程对象,这个存储过程会被上面的激活存储过程调用来解析XML,然后放入表[DDLCollector].[Deadlock_Info]中。

USE [DDLCenter]
GO

-- Create Schema
IF NOT EXISTS(
    SELECT TOP 1 *
    FROM sys.schemas
    WHERE name = 'DDLCollector'
)
BEGIN
    EXEC('CREATE SCHEMA DDLCollector');
END
GO

-- Create table to log the received message
IF OBJECT_ID('DDLCollector.Collect_Records', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Collect_Records]
GO

CREATE TABLE [DDLCollector].[Collect_Records](
    [RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
    [Deadlock_Graph_Msg] [xml] NULL,
    [Deadlock_Graph_Msg_CheckSum] INT,
    [Record_Time] [datetime] NOT NULL 
        CONSTRAINT DF_Collect_Records_Record_Time DEFAULT(GETDATE()),
    CONSTRAINT PK_Collect_Records_RowId PRIMARY KEY
    (RowId ASC)
) ON [PRIMARY]
GO

-- create table to record the exception when error occurs
IF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Error_Records]
GO

CREATE TABLE [DDLCollector].[Error_Records](
    [RowId] [int] IDENTITY(1,1) NOT NULL,
    [Msg_Body] [xml] NULL,
    [Conversation_handle] [uniqueidentifier] NULL,
    [Message_Type] SYSNAME NULL,
    [Service_Name] SYSNAME NULL,
    [Contact_Name] SYSNAME NULL,
    [Record_Time] [datetime] NOT NULL
        CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()),
    [Error_Details] [nvarchar](4000) NULL,
    CONSTRAINT PK_Error_Records_RowId PRIMARY KEY
    (RowId ASC)
) ON [PRIMARY]
GO

-- create business table to record deadlock analysised info
IF OBJECT_ID('DDLCollector.Deadlock_Info', 'U') IS NOT NULL
    DROP TABLE [DDLCollector].[Deadlock_Info]
GO
CREATE TABLE [DDLCollector].[Deadlock_Info](
    RowId INT IDENTITY(1,1) NOT NULL
    ,SQLInstance sysname NULL
    ,SPid INT NULL
    ,is_Vitim BIT NULL
    ,DeadlockGraph XML NULL
    ,DeadlockGraphCheckSum INT NULL
    ,lasttranstarted DATETIME NULL
    ,lastbatchstarted DATETIME NULL
    ,lastbatchcompleted DATETIME NULL
    ,procname SYSNAME NULL 
    ,Code NVARCHAR(max) NULL
    ,LockMode sysname NULL
    ,Indexname sysname NULL
    ,KeylockObject sysname NULL
    ,IndexLockMode sysname NULL
    ,Inputbuf NVARCHAR(max) NULL
    ,LoginName sysname NULL
    ,Clientapp sysname NULL
    ,Action varchar(1000) NULL
    ,status varchar(10) NULL
    ,[Record_Time] [datetime] NOT NULL
        CONSTRAINT DF_Deadlock_Info_Record_Time DEFAULT(GETDATE()),
    CONSTRAINT PK_Deadlock_Info_RowId PRIMARY KEY
    (RowId ASC)
)
GO



USE [DDLCenter]
GO

-- Create store procedure to analysis deadlock graph xml
-- and log into business table
IF OBJECT_ID('DDLCollector.UP_ParseDeadlockGraphEventMsg', 'P') IS NOT NULL
    DROP PROC [DDLCollector].[UP_ParseDeadlockGraphEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ParseDeadlockGraphEventMsg](
    @DeadlockGraph_Msg XML
)
AS  
BEGIN      
    SET NOCOUNT ON; 

    ;WITH deadlock
    AS
    (
        SELECT
            OwnerID = T.C.value('@id', 'varchar(50)')
            ,SPid = T.C.value('(./@spid)[1]','int')
            ,status = T.C.value('(./@status)[1]','varchar(10)')
            ,Victim = case 
                        when T.C.value('@id', 'varchar(50)') = T.C.value('./../../@victim','varchar(50)') then 1 
                        else 0 end
            ,LockMode = T.C.value('@lockMode', 'sysname')
            ,Inputbuf = T.C.value('(./inputbuf/text())[1]','nvarchar(max)')
            ,Code = T.C.value('(./executionStack/frame/text())[1]','nvarchar(max)')
            ,SPName = T.C.value('(./executionStack/frame/@procname)[1]','sysname')
            ,Hostname = T.C.value('(./@hostname)[1]','sysname')
            ,Clientapp = T.C.value('(./@clientapp)[1]','varchar(1000)')
            ,lasttranstarted = T.C.value('(./@lasttranstarted)[1]','datetime')
            ,lastbatchstarted = T.C.value('(./@lastbatchstarted)[1]','datetime')
            ,lastbatchcompleted = T.C.value('(./@lastbatchcompleted)[1]','datetime')
            ,LoginName = T.C.value('@loginname', 'sysname')
            ,Action = T.C.value('(./@transactionname)[1]','varchar(1000)')
        FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/process-list/process') AS T(C)
    )
    ,
    keylock
    AS
    (
        SELECT
            OwnerID = T.C.value('./owner[1]/@id', 'varchar(50)')
            ,KeylockObject = T.C.value('./../@objectname', 'sysname')
            ,Indexname = T.C.value('./../@indexname', 'sysname')
            ,IndexLockMode = T.C.value('./../@mode', 'sysname')
        FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/resource-list/keylock/owner-list') AS T(C)
    )
    SELECT
        SQLInstance = A.Hostname 
        ,A.SPid
        ,is_Vitim = A.Victim
        ,DeadlockGraph = @DeadlockGraph_Msg.query('EVENT_INSTANCE/TextData/deadlock-list')
        ,DeadlockGraphCheckSum = CHECKSUM(CAST(@DeadlockGraph_Msg AS NVARCHAR(MAX)))
        ,A.lasttranstarted
        ,A.lastbatchstarted
        ,A.lastbatchcompleted
        ,A.SPName
        ,A.Code
        ,A.LockMode
        ,B.Indexname
        ,B.KeylockObject
        ,B.IndexLockMode
        ,A.Inputbuf
        ,A.LoginName
        ,A.Clientapp
        ,A.Action
        ,status
        ,[Record_Time] = GETDATE()
    FROM deadlock AS A
            LEFT JOIN keylock AS B
            ON A.OwnerID = B.OwnerID
    ORDER BY A.SPid, A.Victim
    ;
END
GO

-- Create store Procedure for Center server service queue to process deadlock xml
-- when message sending from client server.
IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockGraphEventMsg', 'P') IS NOT NULL
    DROP PROC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]
AS
/*
EXEC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]

SELECT * FROM [DDLCollector].[Collect_Records]

SELECT * FROM [DDLCollector].[Error_Records]

SELECT * FROM [DDLCollector].[Deadlock_Info]
*/
BEGIN      
    SET NOCOUNT ON;   
    DECLARE 
        @handle UNIQUEIDENTIFIER
        , @Message_Type SYSNAME
        , @Service_Name SYSNAME
        , @Contact_Name SYSNAME
        , @Error_Details VARCHAR(2000)
        , @Message_Body XML
        , @Proc_Name SYSNAME
    ;

    -- Store Procedure name
    SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
    ;

    BEGIN TRY
        
    -- Receive deadlock message from service queue
    WAITFOR(RECEIVE TOP(1)        
            @handle = conversation_handle
            , @Message_Type = message_type_name
            , @Service_Name = service_name
            , @Contact_Name = service_contract_name
            , @Message_Body = message_body        
            FROM dbo.[http://soa/deadlock/queue/CenterQueue]),Timeout 500
    ;
            
    IF(@@Rowcount=0)      
        BEGIN   
            RETURN        
        END
    -- Message type is the very correct one
    ELSE IF @Message_Type = N'http://soa/deadlock/MsgType/Request'        
        BEGIN               
            -- Record message log first
            INSERT INTO  [DDLCollector].[Collect_Records](Deadlock_Graph_Msg, [Deadlock_Graph_Msg_CheckSum])          
            VALUES(@Message_Body, CHECKSUM(cast(@Message_Body as NVARCHAR(MAX))))
        
            END CONVERSATION @handle

            --Here call another Store Procedure to process our message to record deadlock relation info
            INSERT INTO [DDLCollector].[Deadlock_Info]
            EXEC [DDLCollector].[UP_ParseDeadlockGraphEventMsg] @Message_Body;
        END
    --End Diaglog Message Type, that means we should end this conversation      
    ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'        
        BEGIN         
            END CONVERSATION @handle;        
        END
    -- Konwn Service Broker Errors by System.   
    ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'        
        BEGIN         
            END CONVERSATION @handle       
        
            INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
            VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name);               
        END       
    ELSE
        -- unknown Message Types.        
        BEGIN         
            END CONVERSATION @handle

            INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
            VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name);

            -- unexpected message type         
            RAISERROR (N' Received unexpected message type: %s', 16, 1, @Message_Type) WITH LOG;        
        END      
    END TRY      
    BEGIN CATCH       
    BEGIN
        -- record exception record       
        SET   @Error_Details=
                ' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
                ' Error Message : ' + ERROR_MESSAGE() +
                ' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
                ' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
                ' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + 
                ' Exception Proc: ' + @Proc_Name
        ;    
        
        INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
        VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); 
    END      
    END CATCH  
END  
GO

创建Master库下Master Key

USE master
GO
-- If the master key is not available, create it. 
IF NOT EXISTS (SELECT * 
                FROM sys.symmetric_keys
                WHERE name LIKE '%MS_DatabaseMasterKey%') 
BEGIN
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'CenterMasterKey*'; 
END 
GO 

创建传输层本地证书并备份到本地文件系统

USE master
GO
-- Crete Transport Layer Certification
CREATE CERTIFICATE TrpCert_RemoteCenter
AUTHORIZATION dbo
WITH SUBJECT = 'TrpCert_RemoteCenter',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- then backup it up to local path
-- and after that copy it to Client server
BACKUP CERTIFICATE TrpCert_RemoteCenter
TO FILE = 'C:\Temp\TrpCert_RemoteCenter.cer';
GO

创建传输层远程证书,这个证书文件来至于Deadlock Client SQL Server

USE master
GO
-- Create certification came from client Server.
CREATE    CERTIFICATE TrpCert_ClientLocal 
FROM FILE = 'C:\Temp\TrpCert_ClientLocal.cer'
GO

创建基于证书文件的用户登录

USE master
GO
-- Create user login
IF NOT EXISTS(SELECT * 
            FROM sys.syslogins 
            WHERE name='SSBDbo')
BEGIN
    CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_RemoteCenter;
END
GO

创建Service Broker TCP/IP通讯端口并授权用户连接权限

USE master
GO
-- Creaet Tcp endpoint for SSB comunication and grant connect to users.      
CREATE ENDPOINT EP_SSB_RemoteCenter
STATE = STARTED 
AS TCP 
( 
     LISTENER_PORT = 4024
) 
FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_RemoteCenter,  ENCRYPTION = REQUIRED 
) 
GO 

-- Grant Connect on Endpoint to User SSBDbo
GRANT CONNECT ON ENDPOINT::EP_SSB_RemoteCenter TO SSBDbo 
GO

创建DDLCenter数据库Master Key

-- Now, let's go inside to conversation database
USE DDLCenter
GO

-- Create Master Key
IF NOT EXISTS (SELECT * 
                FROM sys.symmetric_keys 
                WHERE name LIKE '%MS_DatabaseMasterKey%')
BEGIN        
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';
END
GO

创建会话层本地证书

USE DDLCenter
GO
-- Create conversation layer certification
CREATE CERTIFICATE DlgCert_RemoteCenter
AUTHORIZATION dbo
WITH SUBJECT = 'DlgCert_RemoteCenter',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- backup it up to local path
-- and then copy it to remote client server
BACKUP CERTIFICATE DlgCert_RemoteCenter
TO FILE = 'C:\Temp\DlgCert_RemoteCenter.cer';
GO

创建DDLCenter用户,不需要和任何用户登录匹配

USE DDLCenter
GO
-- Create User for login under conversation database
IF NOT EXISTS(
    SELECT TOP 1 *
    FROM sys.database_principals
    WHERE name = 'SSBDbo'
)
BEGIN
    --CREATE USER SSBDbo FOR LOGIN SSBDbo;
    CREATE USER SSBDbo WITHOUT LOGIN;
END
GO

创建会话层远程证书,这个证书文件来自Deadlock Center SQL Server备份

USE DDLCenter
GO
-- Create converstaion layer certification came from remote client server.
CREATE    CERTIFICATE DlgCert_ClientLocal
AUTHORIZATION SSBDbo
FROM FILE='C:\Temp\DlgCert_ClientLocal.cer'
GO

GRANT CONNECT TO SSBDbo;

创建Service Broker组件对象

USE DDLCenter
GO

-- Create Message Type
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request]
       VALIDATION = WELL_FORMED_XML;
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response]
       VALIDATION = WELL_FORMED_XML;
GO

-- Create Contact
CREATE CONTRACT [http://soa/deadlock/contract/CheckContract](
    [http://soa/deadlock/MsgType/Request] SENT BY INITIATOR,
    [http://soa/deadlock/MsgType/Response] SENT BY TARGET
);
GO

-- Create Queue
CREATE QUEUE [dbo].[http://soa/deadlock/queue/CenterQueue] 
WITH STATUS = ON , RETENTION = OFF
, ACTIVATION (STATUS = ON , 
                PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockGraphEventMsg] , 
                MAX_QUEUE_READERS = 3 , 
                EXECUTE AS N'dbo') 
GO

-- Create Service
CREATE SERVICE [http://soa/deadlock/service/CenterService]
ON QUEUE [http://soa/deadlock/queue/CenterQueue]
(
  [http://soa/deadlock/contract/CheckContract]
);
GO

-- Grant Send on service to User SSBDbo
GRANT SEND ON SERVICE::[http://soa/deadlock/service/CenterService] to SSBDbo;
GO

-- Create Remote Service Bingding
CREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/ClientRSB]
TO SERVICE 'http://soa/deadlock/service/ClientService' 
WITH  USER = SSBDbo,
ANONYMOUS=Off
GO

-- Create Route
CREATE ROUTE [http://soa/deadlock/route/ClientRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',
ADDRESS = 'TCP://10.211.55.3:4022';
GO

Event Notification配置

Event Notification只需要在Deadlock Client Server创建即可,因为只需要在Deadlock Client上跟踪死锁事件。在为Deadlock Client 配置Service Broker章节,我们已经为Event Notification创建了队列、服务和路由。因此,在这里我们只需要创建Event Notification对象即可。方法参见如下的代码:

USE DDLCenter
GO

-- Create Event Notification for the deadlock_graph event.
IF EXISTS(
    SELECT * FROM sys.server_event_notifications  
    WHERE name = 'DeadLockNotificationEvent'
)
BEGIN
    DROP EVENT NOTIFICATION DeadLockNotificationEvent
    ON SERVER;
END
GO


CREATE EVENT NOTIFICATION DeadLockNotificationEvent
ON SERVER
WITH FAN_IN
FOR DEADLOCK_GRAPH
TO SERVICE 
'http://soa/deadlock/service/ClientService', 
'current database'
GO

模拟死锁

至此为止,所有对象和准备工作已经准备完成,万事俱备只欠东风,让我们在Deadlock Client实例上模拟死锁场景。首先,我们在Test数据库下创建两个测试表,表名分别为:dbo.test_deadlock1和dbo.test_deadlock2,代码如下:

IF DB_ID('Test') IS NULL
    CREATE DATABASE Test;
GO

USE Test
GO

-- create two test tables
IF OBJECT_ID('dbo.test_deadlock1','u') IS NOT NULL
    DROP TABLE dbo.test_deadlock1
GO

CREATE TABLE dbo.test_deadlock1(
id INT IDENTITY(1,1) not null PRIMARY KEY
,name VARCHAR(20) null
);

IF OBJECT_ID('dbo.test_deadlock2','u') IS NOT NULL
    DROP TABLE dbo.test_deadlock2
GO

CREATE TABLE dbo.test_deadlock2(
id INT IDENTITY(1,1) not null PRIMARY KEY
,name VARCHAR(20) null
);

INSERT INTO dbo.test_deadlock1
SELECT 'AA'
UNION ALL
SELECT 'BB';


INSERT INTO dbo.test_deadlock2
SELECT 'AA'
UNION ALL
SELECT 'BB';
GO

接下来,我们使用SSMS打开一个新的连接,我们假设叫session 1,执行如下语句:

--session 1
USE Test
GO

BEGIN TRAN 
UPDATE dbo.test_deadlock1
SET name = 'CC'
WHERE id = 1
;
WAITFOR DELAY '00:00:05'

UPDATE dbo.test_deadlock2
SET name = 'CC'
WHERE id = 1
;
ROLLBACK

紧接着,我们使用SSMS打开第二个连接,假设叫Session 2,执行下面的语句:

--session 2
USE Test
GO

BEGIN TRAN 
UPDATE dbo.test_deadlock2
SET name = 'CC'
WHERE id = 1
;

UPDATE dbo.test_deadlock1
SET name = 'CC'
WHERE id = 1
;
COMMIT

等待一会儿功夫以后,死锁发生,并且Session 2做为了死锁的牺牲品,我们会在Session 2的SSMS信息窗口中看到如下的死锁信息:

Msg 1205, Level 13, State 51, Line 8
Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

用户查询死锁信息

根据上面的模拟死锁小节,说明死锁已经真真切切的发生了,那么,死锁信息到底有没有被捕获到呢?如果终端用户想要查看和分析所有客户端的死锁信息,只需要连接Deadlock Center SQL Server,执行下面的语句:

-- Run on Deadlock Center Server
USE DDLCenter
GO

SELECT * FROM [DDLCollector].[Deadlock_Info]

由于结果集宽度太宽,人为将查询结果分两段截图,第一段结果集展示如下:
03.png

第二段结果集截图如下:
04.png

从这个结果集,我们可以清楚的看到Deadlock Client发生死锁的详细信息,包含:
死锁发生的Deadlock Client实例名称:CHERISH-PC
被死锁进程号60,死锁进程57号
死锁相关进程的事务开始时间,最后一个Batch开始执行时间和完成时间
死锁进程执行的代码和Batch语句
死锁发生时锁的类型
表和索引名称
死锁相关进程的登录用户
......
等等。

踩过的坑

当Deadlock Client 上SQL Server发生两次或者两次以上的Deadlock事件以后,自建的Event Notification对象(名为:DeadLockNotificationEvent)会被SQL Server系统自动删除,从而导致整个死锁收集系统无法工作。

表象

SQL Server在错误日志中会抛出如下4个错误信息:两个错误编号为17004,一个编号为17001的错误,最后是一个编号为17005错误,其中17005明确说明了,Event Notification对象被删除了。如下:

Error: 17004, Severity: 16, State: 1.
Event notification conversation on dialog handle '{4A6A0FBD-7A34-E711-A709-001C42099969}' closed without an error.
Error: 17004, Severity: 16, State: 1.
Event notification conversation on dialog handle '{476A0FBD-7A34-E711-A709-001C42099969}' closed without an error.
Error: 17001, Severity: 16, State: 1.
Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{F711A404-7934-E711-A709-001C42099969}'. Error Code = '8429'.
Error: 17005, Severity: 16, State: 1.
Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active.

错误日志截图如下:
05.png

问题分析

从错误提示信息due to send time service broker errors来看,最开始花了很长时间来排查Service Broker方面的问题,在长达数小时的问题排查无果后,静下心来仔细想想:如果是Service Broker有问题的话,我们不可能完成第一、第二条死锁信息的收集,所以问题应该与Service Broker没有直接关系。于是,注意到了错误提示信息的后半部分Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active,再次以可以成功收集两条deadlock错误信息为由,排除Contact和Service的问题可能性,所以最有可能出问题的地方猜测应该是conversation handle,继续排查与conversation handle相关操作的地方,发现存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]的中的代码:

...
ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification'      
        BEGIN               
            -- Record message log first
            INSERT INTO  [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum])         
            VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX))))
        
            END CONVERSATION @handle

            --Here call another Store Procedure to send deadlock graph info to center server
            EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body;
        END
...

这个逻辑分支不应该有End Conversation的操作,因为这里是与Event Notification相关的Message Type操作,而不是Service Broker相关的Message Type操作。

解决问题

问题分析清楚了,解决方法就非常简单了,注释掉这条语句END CONVERSATION @handle后,重新创建存储过程。再多次模拟死锁操作,再也没有出现Event Notification被系统自动删除的情况了,说明这个问题已经被彻底解决,坑已经被填上了。
解决问题的代码修改和注释如下截图,以此纪念下踩过的这个坑:
06.png

福利发放

以下是关于SQL Server死锁相关的系列文章,可以帮助我们全面了解、分析和解决死锁问题,其中第一个是这篇文章的视频演示。
使用Event Notification和Service Broker自动跟踪SQL Server Deadlock的Youku视频:使用事件通知和Service Broker自动跟踪SQL Server死锁
使用DBCC捕获死锁:RDS SQL Server死锁(Deadlock)系列之一使用DBCC捕获死锁
使用Profiler捕获死锁:RDS SQL Server死锁(Deadlock)系列之二使用Profiler捕获死锁
自动部署Profiler捕获死锁:RDS SQL Server死锁(Deadlock)系列之三自动部署Profiler捕获死锁
利用Service Broker事件通知捕获死锁:RDS SQL Server死锁(Deadlock)系列之四利用Service Broker事件通知捕获死锁
利用Extended Events获取死锁信息:RDS SQL Server死锁(Deadlock)系列之五利用Extended Events获取死锁信息

最后总结

这篇文章是一个完整的SQL Server死锁收集系统典型案例介绍,你甚至可以很轻松简单的将这个方案应用到你的产品环境,来收集产品环境所有SQL Server实例发生死锁的详细信息,并根据该系统收集到的场景来改进和改善死锁发生的概率,从而降低死应用发生异常错误的可能性。因此这篇文章有着非常重要的现实价值和意义。

目录
相关文章
|
5月前
|
Dubbo 应用服务中间件 API
启动dubbo消费端过程提示No provider available for the service的问题定位与解决
启动dubbo消费端过程提示No provider available for the service的问题定位与解决
42 0
|
10月前
|
消息中间件 网络安全 RocketMQ
这个错误可能是由于RocketMQ Proxy的配置或者运行状态出现了问题
这个错误可能是由于RocketMQ Proxy的配置或者运行状态出现了问题
146 1
|
SQL 运维 Oracle
【大数据开发运维解决方案】记一次同事不慎用root起动weblogic以及启动日志卡在The server started in RUNNING mode 问题解决过程
最近因为单位换了新版本HD集群,有一些业务数据存在于hive数据库中。而有一些Smartbi的报表数据源是连接的华为HD Hive,因为变更了集群,需要将SmartBi的数据源改为新集群的。我将Kerberos认证凭据和新版本Hive jdbc驱动以及新的jdbc连接串给了同事,也将实施文档给了同事,但是同事在操作完成后,Smarbi节点无法正常起来(后台日志卡在:The server started in RUNNING mode,Server state changed to RUNNING),要么起来了就是无法联通Hive。
【大数据开发运维解决方案】记一次同事不慎用root起动weblogic以及启动日志卡在The server started in RUNNING mode 问题解决过程
|
C# 容器
分布式服务器框架之Server.Common库中实现 秒级定时器
定时器模块是服务器中的常用组件,本文带你实现一个具有基本功能的定时器模块要想设计一个定时器模块,一般包含两部分,一个是定时器对象(ServerTimer),另一个管理定时器对象的管理者(TimerManager)也叫定时器容器;定时器使用了C#内System库里面的Timer定时器作为定时器的主驱动
分布式服务器框架之Server.Common库中实现 秒级定时器
|
Web App开发 缓存 JavaScript
service work 关于更新用户本地缓存的方案
service work + cache 可以用来对网站的一些资源进行本地缓存,甚至可以实现离线访问的功能(如果你的网站是纯静态的)。
142 0
 service work 关于更新用户本地缓存的方案
|
API Android开发
Service中是如何产生ANR的?
Service中是如何产生ANR的?
200 0
Service中是如何产生ANR的?
使用 cf push 部署应用到 SAP BTP 后启动失败 start unsuccessful 该如何处理
使用 cf push 部署应用到 SAP BTP 后启动失败 start unsuccessful 该如何处理
103 0
使用 cf push 部署应用到 SAP BTP 后启动失败 start unsuccessful 该如何处理
|
Windows 开发工具 IDE
【终极解决方案】为应用程序池“XXX”提供服务的进程在与 Windows Process Activation Service 通信时出现严重错误。该进程 ID 为“XXXX”。数据字段包含错误号。
原文:【终极解决方案】为应用程序池“XXX”提供服务的进程在与 Windows Process Activation Service 通信时出现严重错误。该进程 ID 为“XXXX”。数据字段包含错误号。
3439 0
|
SQL 监控 数据格式
使用SQL Server 扩展事件来创建死锁的时间跟踪
原文:使用SQL Server 扩展事件来创建死锁的时间跟踪   我们通过SQL Server 2012图形界面来部署一个扩展事件跟踪会话。然后可以生成SQL脚本,在2008或2008 R2版本下运行类似的跟踪。
1467 0