使用pt-heartbeat监控主从复制延迟-阿里云开发者社区

使用pt-heartbeat监控主从复制延迟

2014-12-01 1129

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介： MySQL主从复制是MySQL 高可用架构中重要的组成部分，该技术可以用于实现负载均衡，高可用和故障切换，以及提供备份等等。对于主从复制的监控，仅仅依赖于MySQL自身提供的show slave status并不可靠。

MySQL主从复制是MySQL 高可用架构中重要的组成部分，该技术可以用于实现负载均衡，高可用和故障切换，以及提供备份等等。对于主从复制的监控，仅仅依赖于MySQL自身提供的show slave status并不可靠。pt-heartbeat是主从复制延迟监控的不错选择，本文描述了主从复制情形下的延迟监控并给出相应示例。

pt-heartbeat为percona-toolkit工具包中的一个，因此使用前需要先安装percona-toolkit，请参考：percona-toolkit的安装及简介

1、pt-heartbeat的作用
pt-heartbeat measures replication lag on a MySQL or PostgreSQL server. You can use it to update a master or monitor a replica. If possible, MySQL connection options are read from your .my.cnf file. For more details, please use the --help option, or try 'perldoc /usr/bin/pt-heartbeat' for complete documentation.

pt-heartbeat is a two-part MySQL and PostgreSQL replication delay monitoring system that measures delay by looking at actual replicated data. This avoids reliance on the replication mechanism itself, which is unreliable. (For example, SHOW SLAVE STATUS on MySQL).

2、pt-heartbeat的原理
    The first part is an --update instance of pt-heartbeat that connects to a master and updates a timestamp (“heartbeat record”) every --interval seconds. Since the heartbeat table may contain records from multiple masters (see “MULTI-SLAVE HIERARCHY”), the server’s ID (@@server_id) is used to identify records.
    主库上存在一个用于检查延迟的表heartbeat，可手动或自动创建
    pt-heartbeat使用--update参数连接到主库上并持续(根据设定的--interval参数)使用一个时间戳更新到表heartbeat

    The second part is a --monitor or --check instance of pt-heartbeat that connects to a slave, examines the replicated heartbeat record from its immediate master or the specified --master-server-id, and computes the difference from the current system time. If replication between the slave and the master is delayed or broken, the computed difference will be greater than zero and otentially increase if --monitor is specified.
    pt-heartbeat使用--monitor 或--check连接到从库，检查从主库同步过来的时间戳，并与当前系统时间戳进行比对产生一个差值，
    该值则用于判断延迟。(注，前提条件是主库与从库应保持时间同步)

You must either manually create the heartbeat table on the master or use --create-table. See --create-table for the proper heartbeat table structure. The MEMORY storage engine is suggested, but not re-quired of course, for MySQL.
The heartbeat table must contain a heartbeat row. By default, a heartbeat row is inserted if it doesn’t exist. This feature can be disabled with the --[no]insert-heartbeat-row option in case the database user does not have INSERT privileges.

    pt-heartbeat depends only on the heartbeat record being replicated to the slave, so it works regardless of the replication mechanism (built-in replication, a system such as Continuent Tungsten, etc). It works at any depth in the replication hierarchy; for example, it will reliably report how far a slave lags its master’s master’s master. And if replication is   stopped, it will continue to work and report (accurately!) that the slave is falling further and further behind the master.
  pt-heartbeat has a maximum resolution of 0.01 second. The clocks on the master and slave servers must be closely synchronized via NTP. By default, --update checks happen on the edge of the second (e.g. 00:01) and --monitor checks happen halfway between seconds (e.g. 00:01.5). As long as the servers’ clocks are closely synchronized and replication events are propagating in less than half a second, pt-heartbeat will report zero seconds of delay.
    pt-heartbeat will try to reconnect if the connection has an error, but will not retry if it can’t get a connection when it first starts.
    The --dbi-driver option lets you use pt-heartbeat to monitor PostgreSQL as well. It is reported to work well  with Slony-1 replication.

3、获取pt-heartbeat帮助信息
a、获取帮助信息
  [root@DBMASTER01 ~]# pt-heartbeat #直接输入pt-heartbeat可获得一个简要描述，使用pt-heartbeat --help获得一个完整帮助信息
  Usage: pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop

  Errors in command-line arguments:
    * Specify at least one of --stop, --update, --monitor or --check
    * --database must be specified

b、几个重要的参数
  Specify at least one of --stop, --update, --monitor, or --check. #至少指定一个
  --update, --monitor, and --check are mutually exclusive.         #互斥参数
  --daemonize and --check are mutually exclusive.                  #互斥参数

  --check
  Check slave delay once and exit. If you also specify --recurse, the tool will try to discover slave’s of the
  given slave and check and print their lag, too. The hostname or IP and port for each slave is printed before its
  delay. --recurse only works with MySQL.

  --daemonize
  Fork to the background and detach from the shell. POSIX operating systems only.

  --frames
  type: string; default: 1m,5m,15m
  Timeframes for averages.
  Specifies the timeframes over which to calculate moving averages when --monitor is given. Specify as a
  comma-separated list of numbers with suffixes. The suf?x can be s for seconds, m for minutes, h for hours, or d
  for days. The size of the largest frame determines the maximum memory usage, as up to the specified number
  of per-second samples are kept in memory to calculate the averages. You can specify as many timeframes as
  you like.

  --monitor
  Monitor slave delay continuously.
  Specifies that pt-heartbeat should check the slave’s delay every second and report to STDOUT (or if --file
  is given, to the file instead). The output is the current delay followed by moving averages over the timeframe
  given in --frames. For example,
  5s [ 0.25s, 0.05s, 0.02s ]

  --stop
  Stop running instances by creating the sentinel file.

  --update
  Update a master’s heartbeat.

4、演示使用pt-heartbeat

a、首先添加表
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --create-table --update 

MASTER> select * from heartbeat;
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| ts                         | server_id | file             | position  | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| 2014-12-01T09:48:14.003020 |        11 | mysql-bin.000390 | 677136957 | mysql-bin.000179      |                 120 |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+

b、更新主库上的heartbeat
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update &
[1] 31249

c、从库上监控延迟
[root@DBBAK01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --monitor --print-master-server-id
1.00s [  0.02s,  0.00s,  0.00s ] 11  #实时延迟，1分钟延迟，5分钟延迟，15分钟延迟
1.00s [  0.03s,  0.01s,  0.00s ] 11  # Author : Leshami
1.00s [  0.05s,  0.01s,  0.00s ] 11  # Blog   : http://blog.csdn.net/leshami
1.00s [  0.07s,  0.01s,  0.00s ] 11
1.00s [  0.08s,  0.02s,  0.01s ] 11
1.00s [  0.10s,  0.02s,  0.01s ] 11
1.00s [  0.12s,  0.02s,  0.01s ] 11
1.00s [  0.13s,  0.03s,  0.01s ] 11

d、其他操作示例
#将主库上的update使用守护进程方式调度
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update --daemonize

#修改主库上的更新间隔为2s 
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update --daemonize --interval=2

#停止主库上的pt-heartbeat守护进程
[root@DBMASTER01 ~]# pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
[root@DBMASTER01 ~]# rm -rf /tmp/pt-heartbeat-sentinel

#单次查看从库上的延迟情况
[robin@DBBAK01 ~]$ pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --check 
1.00

#使用守护进程监控从库并输出日志
[root@DBBAK01 ~]#  pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
--master-server-id=11 --monitor --print-master-server-id --daemonize --log=/tmp/slave-lag.log

使用pt-heartbeat监控主从复制延迟

热门文章

最新文章

相关电子书

相关实验场景