利用pg_stat_activity做日常检查及异常SQL诊断

简介: AnalyticDB for PostgreSQL(原HybridDB for PostgreSQL,以下简称ADB for PG)作为高性能分析型数据库,可以支持用户对其业务数据进行实时分析,能够让企业敏锐感知市场动态,做出必要决策。

AnalyticDB for PostgreSQL(原HybridDB for PostgreSQL,以下简称ADB for PG)作为高性能分析型数据库,可以支持用户对其业务数据进行实时分析,能够让企业敏锐感知市场动态,做出必要决策。本文从ADB for PG用户角度出发,阐述如何借助pg_stat_activity这一系统视图实现实例的基本健康检查。

pg_stat_activity定义

postgres=#  \d+ pg_stat_activity;
                       View "pg_catalog.pg_stat_activity"
      Column      |           Type           | Modifiers | Storage  | Description
------------------+--------------------------+-----------+----------+-------------
 datid            | oid                      |           | plain    | OID of the database this backend is connected to
 datname          | name                     |           | plain    | Name of the database this backend is connected to
 procpid          | integer                  |           | plain    | Process ID of this backend
 sess_id          | integer                  |           | plain    | 
 usesysid         | oid                      |           | plain    | OID of the user logged into this backend
 usename          | name                     |           | plain    | Name of the user logged into this backend
 current_query    | text                     |           | extended | 
 waiting          | boolean                  |           | plain    | True if this backend is currently waiting on a lock
 query_start      | timestamp with time zone |           | plain    | Time when the currently active query was started
 backend_start    | timestamp with time zone |           | plain    | Time when this process was started, i.e., when the client connected to the server
 client_addr      | inet                     |           | plain    |
 client_port      | integer                  |           | plain    |
 application_name | text                     |           | extended |
 xact_start       | timestamp with time zone |           | plain    |
 waiting_reason   | text                     |           | extended | 
View definition:
 SELECT s.datid, d.datname, s.procpid, s.sess_id, s.usesysid, u.rolname AS usename, s.current_query, s.waiting, s.query_start, s.backend_start, s.client_addr, s.client_port, s.application_name, s.xact_start, s.waiting_reason
   FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_port, sess_id, waiting_reason), pg_authid u
  WHERE s.datid = d.oid AND s.usesysid = u.oid;

这是当前ADB for PG的版本输出,其中Description列根据官方文档手动补充。

连接信息

想确认某个用户是否连接到当前DB上:

postgres=# SELECT datname,usename FROM pg_stat_activity WHERE usename = 'joe';
 datname  | usename
----------+---------
 postgres | joe
(1 row)

进一步确认当前所有的连接信息,包括哪些机器在连接:

postgres=# SELECT datname,usename,client_addr,client_port FROM pg_stat_activity ;
datname  |  usename  |  client_addr   | client_port
----------+-----------+----------------+-------------
 postgres | joe       |  xx.xx.xx.xx   |       60621
 postgres | gpmon     |  xx.xx.xx.xx   |       60312
(9 rows)

通过上述信息就能确认当前的连接用户和对应的连接机器。

SQL运行信息

当前DB上运行的SQL也可以通过pg_stat_activity来获取,首先需要确认参数track_activities = on已经设置。该参数会默认设置好,只需要确认即可:

postgres=# show track_activities;
 track_activities
------------------
 on
(1 row)

获取当前用户执行SQL信息:

postgres=# SELECT datname,usename,current_query FROM pg_stat_activity ;
 datname  | usename  |                        current_query
----------+----------+--------------------------------------------------------------
 postgres | postgres | SELECT datname,usename,current_query FROM pg_stat_activity ;
 postgres | joe      | <IDLE>
(2 rows)

只看当前正在运行的SQL信息:

SELECT datname,usename,current_query
   FROM pg_stat_activity
   WHERE current_query != '<IDLE>' ;

查看耗时较长的查询:

select
       current_timestamp - query_start as runtime,
       datname,
       usename,
       current_query
    from pg_stat_activity
    where current_query != '<IDLE>'
    order by 1 desc;

例如输出:

     runtime     |    datname     | usename  |                                current_query
-----------------+----------------+----------+------------------------------------------------------------------------------
 00:00:34.248426 | tpch_1000x_col | postgres | select
                                             :         l_returnflag,
                                             :         l_linestatus,
                                             :         sum(l_quantity) as sum_qty,
                                             :         sum(l_extendedprice) as sum_base_price,
                                             :         sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
                                             :         sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
                                             :         avg(l_quantity) as avg_qty,
                                             :         avg(l_extendedprice) as avg_price,
                                             :         avg(l_discount) as avg_disc,
                                             :         count(*) as count_order
                                             : from
                                             :         public.lineitem
                                             : where
                                             :         l_shipdate <= date '1998-12-01' - interval '93' day
                                             : group by
                                             :         l_returnflag,
                                             :         l_linestatus
                                             : order by
                                             :         l_returnflag,
                                             :         l_linestatus;
 00:00:00        | postgres       | postgres | select
                                             :        current_timestamp - query_start as runtime,
                                             :        datname,
                                             :        usename,
                                             :        current_query
                                             :     from pg_stat_activity
                                             :     where current_query != '<IDLE>'
                                             :     order by 1 desc;
(2 rows)

可以看到第一个查询耗时较久,已经运行了34s还没有结束。

异常SQL诊断及修复

如果一个SQL运行很长时间没有结果,需要检查该SQL还在运行中还是已经被block了:

SELECT datname,usename,current_query
   FROM pg_stat_activity
   WHERE waiting;

需要注意的是这个输出只能获取当前因为lock而被block的SQL,因为其他原因被block的SQL这里获取不到。绝大多数情况下SQL都是因为lock而被block,但也会有一些其他情况例如等待i/o、定时器等。

如果上述SQL有结果输出说明有SQL被lock阻塞,进一步明确相互block的SQL信息:

SELECT
       w.current_query as waiting_query,
       w.procpid as w_pid,
       w.usename as w_user,
       l.current_query as locking_query,
       l.procpid as l_pid,
       l.usename as l_user,
       t.schemaname || '.' || t.relname as tablename
    from pg_stat_activity w
    join pg_locks l1 on w.procpid = l1.pid and not l1.granted
    join pg_locks l2 on l1.relation = l2.relation and l2.granted
    join pg_stat_activity l on  l2.pid = l.procpid
    join pg_stat_user_tables t on l1.relation = t.relid
    where w.waiting;

通过这个SQL的输出信息就能确认相互block的SQL和对应的执行pid。在明确了SQL的阻塞信息后,可以通过cancel或者kill query的方式进行恢复。

通过cancel取消一个正在运行的query:

SELECT pg_cancel_backend(pid)

需要在一个运行query的session中执行,如果session本身就是idle的,执行不起作用。另外取消这个query需要花费一定的时间来做清理和事务的回滚。

使用pg_terminate_backend来清理idle session,也可以用来终止query:

SELECT pg_terminate_backend(pid);

该用户的连接会被断开。尽量避免在正在运行query的进程pid上执行。

需要注意的是文中提到操作需要用户有superuser的权限。

目录
相关文章
|
22天前
|
SQL
启动mysq异常The server quit without updating PID file [FAILED]sql/data/***.pi根本解决方案
启动mysq异常The server quit without updating PID file [FAILED]sql/data/***.pi根本解决方案
16 0
|
6月前
|
SQL 关系型数据库 MySQL
【MySQL异常解决】MySQL执行SQL文件出现【Unknown collation ‘utf8mb4_0900_ai_ci‘】的解决方案
【MySQL异常解决】MySQL执行SQL文件出现【Unknown collation ‘utf8mb4_0900_ai_ci‘】的解决方案
210 0
|
6月前
|
SQL Java Spring
【MybatisPlus异常】The SQL execution time is too large, please optimize
【MybatisPlus异常】The SQL execution time is too large, please optimize
179 0
【MybatisPlus异常】The SQL execution time is too large, please optimize
|
6月前
|
Java 关系型数据库 MySQL
【Java异常】java.sql.SQLExcetion:Cannot convert value “0000-00-00 00:00:00” from column 9 to TIMESTAMP
【Java异常】java.sql.SQLExcetion:Cannot convert value “0000-00-00 00:00:00” from column 9 to TIMESTAMP
45 0
|
3月前
|
SQL 监控 关系型数据库
解密SQL性能异常事件及阿里云数据库的性能调优实践
作为开发者想必都知道数据库是现代应用的核心组件之一,而且在当今互联网时代之下,SQL查询的性能直接影响系统的整体性能,它的性能对于系统的稳定性和响应速度至关重要。那么本文就来讨论一下SQL性能异常的排查和优化方法,包括我个人印象深刻的SQL性能异常事件,以及分享一下使用阿里云数据库产品/工具进行SQL性能调优的经验和心得体会。
85 1
解密SQL性能异常事件及阿里云数据库的性能调优实践
|
3月前
|
SQL 监控 关系型数据库
常见的SQL优化和排查性能异常秘籍
常见的SQL优化和排查性能异常秘籍
34 1
|
5月前
|
SQL 关系型数据库 MySQL
My SQL下载安装配置检查
My SQL下载安装配置检查
45 0
|
6月前
|
SQL 关系型数据库 MySQL
【SQL异常】启动MySQL报错:ERROR 2003 (HY000): Can‘t connect to MySQL server on ‘localhost‘ (10061)
【SQL异常】启动MySQL报错:ERROR 2003 (HY000): Can‘t connect to MySQL server on ‘localhost‘ (10061)
83 1
|
6月前
|
SQL 关系型数据库 MySQL
【SQL异常】启动MySQL时发生系统错误的解决方法(net start mysql 发生系统错误 5。 拒绝访问)
【SQL异常】启动MySQL时发生系统错误的解决方法(net start mysql 发生系统错误 5。 拒绝访问)
133 1