pt-tools系统:pt-kill 实战

  1. 云栖社区>
  2. 博客>
  3. 正文

pt-tools系统:pt-kill 实战

兰春 2018-11-23 20:48:46 浏览553
展开阅读全文

列出几种常用场景,并进行分析实战测试

前提

* pt-kill必须获得的权限

1. PROCESS ,  SUPER 
2. 否则,你只能看到打印出kill id,但是实际上并没有被kill掉  

特殊、 打印出执行时间超过3秒的connection,仅仅打印,不kill

  • 每2秒循环一次,超过10秒就退出pt-kill程序
pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --print --ignore-self --busy-time=3 --interval 2 --run-time=10

重点注意: 这里的--busy-time=3,指的是Command=Query的连接,其他的并不会被匹配哦 , 所以一般情况下删除的都是比较安全的用户thread
重点注意2: ddl,dml,select,都是属于Command=Query

+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+
| Id   | User          | Host                 | db           | Command          | Time   | State                                                         | Info                                      |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+
|    3 | repl          | xx.xxx.126.166:60528 | NULL         | Binlog Dump GTID | 492777 | Master has sent all binlog to slave; waiting for more updates | NULL                                      |
|    4 | repl          | xx.xxx.126.165:48604 | NULL         | Binlog Dump GTID | 492765 | Master has sent all binlog to slave; waiting for more updates | NULL                                      |
|  502 | job_heartbeat | xx.xxx.2.217:34626   | heartbeat_db | Sleep            |      0 |                                                               | NULL                                      |
| 1053 | dbadmin       | localhost            | heartbeat_db | Query            |      1 | altering table                                                | alter table heartbeat add column ts2 date |
| 1055 | pt_kill       | xx.xxx.126.166:63167 | NULL         | Sleep            |      1 |                                                               | NULL                                      |
| 1056 | dbadmin       | localhost            | NULL         | Query            |      0 | starting                                                      | show processlist                          |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+


+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+
| Id   | User          | Host                 | db           | Command          | Time   | State                                                         | Info                                            |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+
|    3 | repl          | xx.xxx.126.166:60528 | NULL         | Binlog Dump GTID | 492716 | Master has sent all binlog to slave; waiting for more updates | NULL                                            |
|    4 | repl          | xx.xxx.126.165:48604 | NULL         | Binlog Dump GTID | 492704 | Master has sent all binlog to slave; waiting for more updates | NULL                                            |
|  502 | job_heartbeat | xx.xxx.2.217:34626   | heartbeat_db | Sleep            |      0 |                                                               | NULL                                            |
| 1053 | dbadmin       | localhost            | heartbeat_db | Query            |      3 | updating                                                      | update heartbeat set ts = '2018-09-06 00:07:58' |
| 1055 | pt_kill       | xx.xxx.126.166:63167 | NULL         | Sleep            |      0 |                                                               | NULL                                            |
| 1056 | dbadmin       | localhost            | NULL         | Query            |      0 | starting                                                      | show processlist                                |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+

+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+
| Id   | User          | Host                 | db           | Command          | Time   | State                                                         | Info              |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+
|    3 | repl          | xx.xxx.126.166:60528 | NULL         | Binlog Dump GTID | 492613 | Master has sent all binlog to slave; waiting for more updates | NULL              |
|    4 | repl          | xx.xxx.126.165:48604 | NULL         | Binlog Dump GTID | 492601 | Master has sent all binlog to slave; waiting for more updates | NULL              |
|  502 | job_heartbeat | xx.xxx.2.217:34626   | heartbeat_db | Sleep            |      0 |                                                               | NULL              |
| 1053 | dbadmin       | localhost            | heartbeat_db | Query            |      3 | User sleep                                                    | select 1,sleep(4) |
| 1055 | pt_kill       | xx.xxx.126.166:63167 | NULL         | Sleep            |      1 |                                                               | NULL              |
| 1056 | dbadmin       | localhost            | NULL         | Query            |      0 | starting                                                      | show processlist  |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+

+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+
| Id   | User          | Host                 | db           | Command          | Time   | State                                                         | Info             |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+
|    3 | repl          | xx.xxx.126.166:60528 | NULL         | Binlog Dump GTID | 492740 | Master has sent all binlog to slave; waiting for more updates | NULL             |
|    4 | repl          | xx.xxx.126.165:48604 | NULL         | Binlog Dump GTID | 492728 | Master has sent all binlog to slave; waiting for more updates | NULL             |
|  502 | job_heartbeat | xx.xxx.2.217:34626   | heartbeat_db | Sleep            |      0 |                                                               | NULL             |
| 1053 | dbadmin       | localhost            | heartbeat_db | Query            |      4 | starting                                                      | rollback         |
| 1055 | pt_kill       | xx.xxx.126.166:63167 | NULL         | Sleep            |      0 |                                                               | NULL             |
| 1056 | dbadmin       | localhost            | NULL         | Query            |      0 | starting                                                      | show processlist |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+

一、打印出sleep时间超过3秒的connection,仅仅打印,不kill

  • 每2秒循环一次,无限循环下去
pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --print --ignore-self --idle-time=3 --interval 2
  • 每2秒循环一次,超过10秒就退出pt-kill程序
pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --print --ignore-self --idle-time=3 --interval 2 --run-time=10

二、kill掉query语句中带有sleep关键字(不区分大小写)的connection, 且Time超过3秒

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --print --ignore-self  --interval 2 --match-info "(?i-xsm:(sleep))" --busy-time=3 --kill --victims all

三、kill掉非系统用户的select开头,且执行时间超过3秒的 connection

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))"  --ignore-user="root|repl"  --busy-time=3 --kill --victims all

四、kill掉非系统用户的select,update,delete开头,且执行时间超过3秒的 connection

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))"  --ignore-user="root|repl"  --busy-time=3 --kill --victims all

五、kill掉指定特征的query语句

  • kill掉非系统用户,且query语句中同时包含heartbeat 和 where ,且heartbeat在前where在后,且执行时间超过3秒的 connection

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info ".*heartbeat.*where.*"  --ignore-user="root|repl"  --busy-time=3 --kill --victims all


# 2018-11-15T10:38:03 KILL 1053 (Query 27 sec) select *,sleep(10) from heartbeat  where id < '1000000000'
# 2018-11-15T10:38:05 KILL 1053 (Query 29 sec) select *,sleep(10) from heartbeat  where id < '1000000000'
# 2018-11-15T10:38:07 KILL 1053 (Query 31 sec) select *,sleep(10) from heartbeat  where id < '1000000000'
# 2018-11-15T10:38:09 KILL 1053 (Query 33 sec) select *,sleep(10) from heartbeat  where id < '1000000000'

六、kill掉非系统库的select开头,且执行时间超过3秒的 connection


pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))"  --ignore-db="mysql|information_schema"   --ignore-user="root|repl"  --busy-time=3 --kill --victims all

七、kill掉非系统用户的select,update,delete开头,且执行时间超过3秒,且不是被locked住 的 connection

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))"  --ignore-state="Locked"   --ignore-user="root|repl"  --busy-time=3  --kill  --victims all

八、kill掉非系统用户,指定state(Locked、login、Updating、Sorting for order等状态),且执行时间超过3秒 的 connection

  • 指定Locked的connection删除掉
pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))"  --match-state="Locked"   --ignore-user="root|repl"  --busy-time=3 --kill --victims all

九、kill掉非系统用户,指定Command(Query、Sleep、Binlog Dump、Connect等状态),且执行时间超过3秒 的 connection

  • kill掉指定Connect的command connection
pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))"  --match-command="Connect"   --ignore-user="root|repl"  --busy-time=3 --kill --victims all

十、kill掉指定来源host ip ,且select开头的,且执行时间超过3s的connection

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info "(?i-xsm:^(select))"  --ignore-host="x.x.x.x"   --ignore-user="root|repl"  --busy-time=3 --kill --victims all

十一、kill掉非系统用户,Command=Sleep,且空闲时间为3s的connection


pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2  --ignore-user="root|repl"  --idle-time=3 --kill --victims all

十二、kill掉非系统用户,指定特征的query,在后台运行,并打印日志

pt-kill  --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass  --ignore-self --print  --interval 2 --match-info ".*heartbeat.*where.*"  --ignore-user="root|repl"  --busy-time=3 --daemonize --log='/root/kill.log' --kill --victims all

十三、--victims的用法

  • 背景
pid:1103 , 是最早开启事务的空闲进程 T1
pid:1095 , 是第二早开启事务的ddl进程 T2
pid:502  , 是最后一个开启事务的dml进程 T3

事务顺序是: T1 锁住了 T2,T3,  T2锁住了T3 , T3被T1,T2锁住

+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+
| Id   | User          | Host                 | db           | Command          | Time   | State                                                         | Info                                                    |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+
|    3 | repl          | xx.xxx.126.166:60528 | NULL         | Binlog Dump GTID | 507504 | Master has sent all binlog to slave; waiting for more updates | NULL                                                    |
|    4 | repl          | xx.xxx.126.165:48604 | NULL         | Binlog Dump GTID | 507492 | Master has sent all binlog to slave; waiting for more updates | NULL                                                    |
|  502 | job_heartbeat | xx.xxx.2.217:34626   | heartbeat_db | Query            |    328 | Waiting for table metadata lock                               | insert into heartbeat(ts) values('2018-11-15 13:43:50') |
| 1095 | dbadxxx       | localhost            | heartbeat_db | Query            |    329 | Waiting for table metadata lock                               | alter table heartbeat add column ts3 date               |
| 1103 | dbadxxx       | localhost            | heartbeat_db | Sleep            |    341 |                                                               | NULL                                                    |
| 1104 | dbadxxx       | localhost            | NULL         | Query            |      0 | starting                                                      | show processlist                                        |
+------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+
6 rows in set (0.00 sec)
  • --match-command="Query|Sleep" --victims oldest --busy-time=3

只kill最老的command为Query|Sleep的最老的链接

pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --ignore-user="root|repl"   --match-command="Query|Sleep"      --busy-time=3  --victims oldest

Enter MySQL password:
# 2018-11-15T13:49:07 KILL 1103 (Sleep 330 sec) NULL
# 2018-11-15T13:49:09 KILL 1103 (Sleep 332 sec) NULL
  • --match-command="Query|Sleep" --victims all --busy-time=3

kill 所有command="Query|Sleep" 的所有链接

Enter MySQL password:
# 2018-11-15T13:48:41 KILL 1103 (Sleep 304 sec) NULL
# 2018-11-15T13:48:41 KILL 1095 (Query 292 sec) alter table heartbeat add column ts3 date
# 2018-11-15T13:48:41 KILL 502 (Query 291 sec) insert into heartbeat(ts) values('2018-11-15 13:43:50')
# 2018-11-15T13:48:41 KILL 1104 (Sleep 262 sec) NULL

  • --match-command="Query" --victims all --busy-time=3

kill 所有 command="Query"(默认不填也就是Query)的所有链接

Enter MySQL password:
# 2018-11-15T13:46:59 KILL 1095 (Query 190 sec) alter table heartbeat add column ts3 date
# 2018-11-15T13:46:59 KILL 502 (Query 189 sec) insert into heartbeat(ts) values('2018-11-15 13:43:50')
  • --match-command="Query" --victims oldest --busy-time=3

kill 所有 command="Query"(默认不填也就是Query)的最老的链接

Enter MySQL password:
# 2018-11-15T13:59:01 KILL 1095 (Query 912 sec) alter table heartbeat add column ts3 date

十四、kill掉非系统用户,指定库的所有链接

举例一、kill掉非root和repl账号,使用open_db或xf_loupan_db的所有链接

 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --ignore-user="root|repl"  --match-db="open_db|xf_loupan_db"  --victims all --kill

dbadmin:xf_loupan_db> select *,sleep(100) from loupan_grade_photo;
ERROR 2013 (HY000): Lost connection to MySQL server during query
dbadmin:xf_loupan_db> select *,sleep(100) from loupan_grade_photo;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    245
Current database: xf_loupan_db

ERROR 2013 (HY000): Lost connection to MySQL server during query


Enter MySQL password:
# 2018-12-26T12:34:46 KILL 243 (Query 7 sec) select *,sleep(100) from loupan_grade_photo
# 2018-12-26T12:35:00 KILL 245 (Query 1 sec) select *,sleep(100) from loupan_grade_photo

特别注意: 这里面只针对db这一列是open_db或xf_loupan_db才会有效果,如果这个用户拥有多个库的权限
那么他在xx库去 select * from xf_loupan_db.loupan_grade_photo 是不会生效的
所以,要保证这个库账号只能对应一个库,才会有效果

比如这样是没有效果的:

root:(none)> show processlist;
+-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+
| Id  | User    | Host                 | db   | Command | Time | State      | Info                                                     |
+-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+
| 248 | root    | localhost            | NULL | Query   |    0 | starting   | show processlist                                         |
| 252 | pt_kill | xx.xxx.126.166:33088 | NULL | Sleep   |    1 |            | NULL                                                     |
| 257 | dbadmin | localhost            | sys  | Query   |   19 | User sleep | select *,sleep(100) from xf_loupan_db.loupan_grade_photo |
+-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+
3 rows in set (0.00 sec)

十五、kill掉指定用户的所有链接

一般我们什么时候使用它呢? 一般是在迁移某个DB,需要将这个DB上的账号链接全部清理掉,防止长连接


举例:kill掉所有lc_rx 和 rc_ronly 两个账号的所有链接

pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-user="lc_rx|rc_ronly"  --victims all --kill



网友评论

登录后评论
0/500
评论
兰春
+ 关注