将要加入PostgreSQL 9.5了
在数据库变得比较大时, 例如上TB, 如果部署了PostgreSQL primary-standby 流复制或者log shipping HA.
当发生了failover, old primary节点可能因为某些原因需要重新同步数据.
在广域网上, 由于网络异常也可能造成standby节点落后主节点, 导致需要重新同步数据.
小数据库重新同步数据很方便, 全量或者使用rsync增量同步都可以.
但是数据库很大的情况下, rsync也会变得非常慢, 而且大量消耗主机IO资源.
PostgreSQL 社区有提议在核心中加入通过wal文件解析, 达到增量同步到目的. 目前还在开发阶段.
目前已经有一个工具名为pg_rewind, 也是一个增量同步工具, 具体的做法是通过解析wal, 同步变更过的数据块. 仅仅支持9.3及以上版本. 因为需要data page checksum的支持.
原理如下 :
Theory of operation ------------------- The basic idea is to copy everything from the new cluster to old, except for the blocks that we know to be the same. 1. Scan the WAL log of the old cluster, starting from the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that are touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off. 2. Copy all those changed blocks from the new master to the old master. 3. Copy all other files like clog, conf files etc. from the new cluster to old. Everything except the relation files. 4. Apply the WAL from the new master, starting from the checkpoint created at failover. (pg_rewind doesn't actually apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL)
AI 代码解读
详细介绍参考此文 :
下面测试一下pg_rewind的强大功能. 测试中包含了standby节点promote后, 主节点和备节点都发生了变更的情况, 当然发生变更产生的wal必须存在. 如果
old primary发生的变更
已经在归档目录, 需要先手工将这些变更拷贝到pg_xlog目录. 备节点发生的变更无所谓, 因为可以通过recovery.conf来解决.
下载postgresql 9.3 :
安装PostgreSQL 9.3
下载pg_rewind :
http://git.postgresql.org/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/REL9_3_STABLE wget http://git.postgresql.org/gitweb/?p=postgresql.git;a=snapshot;h=b5a20ab3e0310103ff11337faeed3c521f5eb917;sf=tgz
AI 代码解读
安装PostgreSQL 9.3
tar -zxvf postgresql-b5a20ab.tar.gz cd postgresql-b5a20ab ./configure --prefix=/opt/pgsql9.3beta2 --with-pgport=1999 --with-perl --with-tcl --with-python --with-openssl --with-pam --without-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-blocksize=16 && gmake && gmake install cd contrib/ gmake && gmake install
AI 代码解读
下载pg_rewind :
安装pg_rewind
初始化数据库, 使用data page checksums
配置主节点
启动主节点数据库
创建standby(172.16.3.39)
unzip pg_rewind-master.zip mv pg_rewind-master postgresql-b5a20ab/contrib/ cd postgresql-b5a20ab/contrib/pg_rewind-master [root@db-172-16-3-33 pg_rewind-master]# export PATH=/opt/pgsql9.3beta2/bin:$PATH [root@db-172-16-3-33 pg_rewind-master]# which pg_config /opt/pgsql9.3beta2/bin/pg_config [root@db-172-16-3-33 pg_rewind-master]# gmake clean [root@db-172-16-3-33 pg_rewind-master]# gmake [root@db-172-16-3-33 pg_rewind-master]# gmake install [root@db-172-16-3-33 pg_rewind-master]# which pg_rewind /opt/pgsql9.3beta2/bin/pg_rewind [root@db-172-16-3-33 pg_rewind-master]# pg_rewind --help pg_rewind resynchronizes a cluster with another copy of the cluster. Usage: pg_rewind [OPTION]... Options: -D, --target-pgdata=DIRECTORY existing data directory to modify --source-pgdata=DIRECTORY source data directory to sync with --source-server=CONNSTR source server to sync with -v write a lot of progress messages -n, --dry-run stop before modifying anything -V, --version output version information, then exit -?, --help show this help, then exit Report bugs to <xxx>.
AI 代码解读
初始化数据库, 使用data page checksums
pg93@db-172-16-3-33-> initdb -D $PGDATA -E UTF8 --locale=C -W -U postgres -k
AI 代码解读
配置主节点
vi pg_hba.conf host replication postgres 172.16.3.0/24 md5 host all all 0.0.0.0/0 md5 vi postgresql.conf listen_addresses = '0.0.0.0' # what IP address(es) to listen on; port = 1999 # (change requires restart) max_connections = 100 # (change requires restart) superuser_reserved_connections = 3 # (change requires restart) unix_socket_directories = '.' # comma-separated list of directories unix_socket_permissions = 0700 # begin with 0 to use octal notation shared_buffers = 1024MB # min 128kB maintenance_work_mem = 512MB # min 1MB shared_preload_libraries = 'pg_stat_statements' # (change requires restart) vacuum_cost_delay = 10 # 0-100 milliseconds vacuum_cost_limit = 10000 # 1-10000 credits bgwriter_delay = 10ms # 10-10000ms between rounds wal_level = hot_standby # minimal, archive, or hot_standby synchronous_commit = off # synchronization level; wal_sync_method = fdatasync # the default is the first option wal_buffers = 16384kB # min 32kB, -1 sets based on shared_buffers wal_writer_delay = 10ms # 1-10000 milliseconds checkpoint_segments = 32 # in logfile segments, min 1, 16MB each archive_mode = on # allows archiving to be done archive_command = 'test ! -f /pgdata/digoal/1921/data03/pg93/pg_arch/%f && cp %p /pgdata/digoal/1921/data03/pg93/pg_arch/%f'# command to use to archive a logfile segment max_wal_senders = 32 # max number of walsender processes wal_keep_segments = 128 # in logfile segments, 16MB each; 0 disables hot_standby = on # "on" allows queries during recovery max_standby_archive_delay = 300s # max delay before canceling queries max_standby_streaming_delay = 300s # max delay before canceling queries wal_receiver_status_interval = 1s # send replies at least this often hot_standby_feedback = on # send info from standby to prevent random_page_cost = 1.5 # same scale as above effective_cache_size = 128000MB log_destination = 'csvlog' # Valid values are combinations of logging_collector = on # Enable capturing of stderr and csvlog log_directory = 'pg_log' # directory where log files are written, log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, log_file_mode = 0600 # creation mode for log files, log_truncate_on_rotation = on # If on, an existing log file with the log_rotation_age = 1d # Automatic rotation of logfiles will log_rotation_size = 10MB # Automatic rotation of logfiles will log_checkpoints = on log_connections = on log_disconnections = on log_error_verbosity = verbose # terse, default, or verbose messages log_lock_waits = on # log lock waits >= deadlock_timeout log_statement = 'ddl' # none, ddl, mod, all log_timezone = 'PRC' autovacuum = on # Enable autovacuum subprocess? 'on' log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and datestyle = 'iso, mdy' timezone = 'PRC' lc_messages = 'C' # locale for system error message lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting default_text_search_config = 'pg_catalog.english' pg_stat_statements.max = 1000 pg_stat_statements.track = all pg93@db-172-16-3-33-> cp $PGHOME/share/recovery.conf.sample $PGDATA/ pg93@db-172-16-3-33-> mv $PGDATA/recovery.conf.sample $PGDATA/recovery.done vi recovery.done recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=172.16.3.39 port=1999 user=postgres keepalives_idle=60'
AI 代码解读
启动主节点数据库
pg93@db-172-16-3-33-> pg_ctl start server starting pg93@db-172-16-3-33-> LOG: 00000: loaded library "pg_stat_statements" LOCATION: load_libraries, miscinit.c:1296
AI 代码解读
创建standby(172.16.3.39)
安装postgresql
略, 同主节点
安装pg_rewind
略, 同主节点
创建$PGDATA目录
复制主节点数据库
启动standby数据库
主节点控制文件信息 :
备节点控制文件信息 :
测试 :
mkdir -p $PGDATA chown pg93:pg93 $PGDATA chmod 700 $PGDATA su - pg93
AI 代码解读
复制主节点数据库
root@db-172-16-3-39-> su - pg93 pg93@db-172-16-3-39-> vi ~/.pgpass 172.16.3.33:1999:*:postgres:postgres pg93@db-172-16-3-39-> chmod 400 ~/.pgpass pg93@db-172-16-3-39-> pg_basebackup -D $PGDATA -F p -x -P -v -h 172.16.3.33 -p 1999 -U postgres WARNING: skipping special file "./.s.PGSQL.1999" transaction log start point: 0/2000028 on timeline 1 WARNING: skipping special file "./.s.PGSQL.1999"g_root/pg_subtrans/0000) 36575/36575 kB (100%), 1/1 tablespace transaction log end point: 0/20000F0 pg_basebackup: base backup completed pg93@db-172-16-3-39-> cd $PGDATA pg93@db-172-16-3-39-> mv recovery.done recovery.conf pg93@db-172-16-3-39-> vi recovery.conf recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'host=172.16.3.33 port=1999 user=postgres keepalives_idle=60' # e.g. 'host=localhost port=5432'
AI 代码解读
启动standby数据库
pg93@db-172-16-3-39-> pg_ctl start server starting pg93@db-172-16-3-39-> LOG: 00000: loaded library "pg_stat_statements" LOCATION: load_libraries, miscinit.c:1296
AI 代码解读
主节点控制文件信息 :
pg93@db-172-16-3-33-> pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5908450106616519131 Database cluster state: in production pg_control last modified: Mon 05 Aug 2013 10:31:56 AM CST Latest checkpoint location: 0/30007E8 Prior checkpoint location: 0/3000710 Latest checkpoint's REDO location: 0/30007A8 Latest checkpoint's REDO WAL file: 000000010000000000000003 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1685 Latest checkpoint's NextOID: 24576 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1684 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Mon 05 Aug 2013 10:31:56 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 100 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 1
AI 代码解读
备节点控制文件信息 :
pg93@db-172-16-3-39-> pg_controldata pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5908450106616519131 Database cluster state: in archive recovery pg_control last modified: Mon 05 Aug 2013 10:31:39 AM CST Latest checkpoint location: 0/2000060 Prior checkpoint location: 0/2000060 Latest checkpoint's REDO location: 0/2000028 Latest checkpoint's REDO WAL file: 000000010000000000000002 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1684 Latest checkpoint's NextOID: 12815 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 1674 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 1684 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint: Mon 05 Aug 2013 10:28:59 AM CST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/20000F0 Min recovery ending loc's timeline: 1 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 100 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 16384 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 1
AI 代码解读
测试 :
1. 主节点
数据变更测试
数据一致性验证
pg93@db-172-16-3-33-> psql psql (9.3beta2) Type "help" for help. digoal=# create table test (id int primary key, info text, crt_time timestamp); CREATE TABLE digoal=# create or replace function func() returns void as $$ digoal$# declare digoal$# v_id int; digoal$# begin digoal$# v_id := round(5000000*random()); digoal$# update test set info=md5(random()::text),crt_time=clock_timestamp() where id=v_id; digoal$# if found then digoal$# return; digoal$# else digoal$# insert into test values (v_id,md5(random()::text),clock_timestamp()); digoal$# end if; digoal$# return; digoal$# end; digoal$# $$ language plpgsql strict; CREATE FUNCTION digoal=# select func(); func ------ (1 row) digoal=# select * from test ; id | info | crt_time ---------+----------------------------------+---------------------------- 3554644 | c5aabfa68774a7bd9a623819537475c6 | 2013-08-05 10:39:49.304063 (1 row) digoal=# select func(); func ------ (1 row) digoal=# select * from test ; id | info | crt_time ---------+----------------------------------+---------------------------- 3554644 | c5aabfa68774a7bd9a623819537475c6 | 2013-08-05 10:39:49.304063 2856072 | ec17bc98163a1ac0cbcdeadd0b151607 | 2013-08-05 10:39:54.324455 (2 rows)
AI 代码解读
数据变更测试
vi test.sql select func(); pg93@db-172-16-3-33-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 1999 -U postgres -c 16 -j 4 -T 10 digoal Client 13 aborted in state 0: ERROR: duplicate key value violates unique constraint "test_pkey" DETAIL: Key (id)=(3717357) already exists. CONTEXT: SQL statement "insert into test values (v_id,md5(random()::text),clock_timestamp())" PL/pgSQL function func() line 10 at SQL statement transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 10 s number of transactions actually processed: 419517 tps = 41926.489602 (including connections establishing) tps = 42031.118850 (excluding connections establishing) statement latencies in milliseconds: 0.376348 select func();
AI 代码解读
数据一致性验证
主节点
备节点
2. 备节点
数据变更测试
备节点数据 :
3. 主节点
pg93@db-172-16-3-33-> psql psql (9.3beta2) Type "help" for help. digoal=# select sum(hashtext(test.*::text)) from test; sum --------------- -215513112678 (1 row) digoal=# select count(*) from test ; count -------- 402434 (1 row)
AI 代码解读
备节点
pg93@db-172-16-3-39-> psql psql (9.3beta2) Type "help" for help. digoal=# select sum(hashtext(test.*::text)) from test; sum --------------- -215513112678 (1 row) digoal=# select count(*) from test ; count -------- 402434 (1 row)
AI 代码解读
2. 备节点
promote pg93@db-172-16-3-39-> pg_ctl promote server promoting
AI 代码解读
数据变更测试
vi test.sql select func(); pg93@db-172-16-3-39-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 1999 -U postgres -c 16 -j 4 -T 10 digoal transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 10 s number of transactions actually processed: 378395 tps = 37814.175846 (including connections establishing) tps = 37866.507340 (excluding connections establishing) statement latencies in milliseconds: 0.419977 select func();
AI 代码解读
备节点数据 :
pg93@db-172-16-3-39-> psql psql (9.3beta2) Type "help" for help. digoal=# select sum(hashtext(test.*::text)) from test; sum -------------- 380706298298 (1 row) digoal=# select count(*) from test ; count -------- 737925 (1 row)
AI 代码解读
3. 主节点
主节点继续变更数据
主节点数据 :
4. 主节点变更为备节点, 一般可以通过rsync从新的主节点(172.16.3.39)来同步$PGDATA, 或者全新的pg_basebackup一次.
pg93@db-172-16-3-33-> pgbench -M prepared -f ./test.sql -r -n -h $PGDATA -p 1999 -U postgres -c 16 -j 4 -T 10 digoal transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 16 number of threads: 4 duration: 10 s number of transactions actually processed: 399093 tps = 39862.553184 (including connections establishing) tps = 39960.089273 (excluding connections establishing) statement latencies in milliseconds: 0.398488 select func();
AI 代码解读
主节点数据 :
pg93@db-172-16-3-33-> psql psql (9.3beta2) Type "help" for help. digoal=# select sum(hashtext(test.*::text)) from test; sum -------------- 127807805610 (1 row) digoal=# select count(*) from test ; count -------- 755238 (1 row)
AI 代码解读
4. 主节点变更为备节点, 一般可以通过rsync从新的主节点(172.16.3.39)来同步$PGDATA, 或者全新的pg_basebackup一次.
本文使用pg_rewind来实现.
停原主库(172.16.3.33)
配置.pgpass
重新同步
多次切换时间线后, 出现如下错误
在pg_rewind.c中加了一个打印tli信息的代码
重新编译pg_rewind, timeline.c代码有问题, 历史文件解析不正确. 如下 :
pg93@db-172-16-3-33-> pg_ctl stop -m fast waiting for server to shut down..... done server stopped
AI 代码解读
配置.pgpass
su - pg93 pg93@db-172-16-3-33-> vi .pgpass 172.16.3.39:1999:*:postgres:postgres pg93@db-172-16-3-33-> chmod 400 .pgpass
AI 代码解读
重新同步
pg93@db-172-16-3-33-> pg_rewind -D $PGDATA --source-server='host=172.16.3.39 port=1999 dbname=digoal' -v connected to remote server unexpected control file size 0, expected 8192
AI 代码解读
多次切换时间线后, 出现如下错误
pg93@db-172-16-3-33-> pg_rewind -D $PGDATA --source-server='host=172.16.3.39 port=1999 dbname=digoal' -v connected to remote server fetched file "global/pg_control", length 8192 fetched file "pg_xlog/00000006.history", length 213 could not find common ancestor of the source and target cluster's timelines
AI 代码解读
在pg_rewind.c中加了一个打印tli信息的代码
/* * Trace the history backwards, until we hit the target timeline. * * TODO: This assumes that there are no timeline switches on the target * cluster after the fork. */ for (i = nentries - 1; i >= 0; i--) { TimeLineHistoryEntry *entry = &sourceHistory[i]; fprintf(stdout, "nen:%d,srctli:%d,targettli:%d\n", i, entry->tli, targettli);
AI 代码解读
重新编译pg_rewind, timeline.c代码有问题, 历史文件解析不正确. 如下 :
pg93@db-172-16-3-39-> pg_rewind -D $PGDATA --source-server='host=172.16.3.33 port=1999 dbname=digoal' -v connected to remote server fetched file "global/pg_control", length 8192 fetched file "pg_xlog/0000000B.history", length 419 nen:1,srctli:11,targettli:10 nen:0,srctli:1,targettli:10 could not find common ancestor of the source and target cluster's timelines
AI 代码解读
使用如下方法修复以上问题.
修改新的主库history文件, 把最后一行放上来即可. 但是紧接着又是前面的错误.
unexpected control file size 0, expected 8192, 如下
重新执行pg_rewind
启动数据库, 无法启动 :
这个错误, 需要删除新的standby(老的primary)的backup_label文件.
启动数据库, 无法启动 :
因为前面修改了history文件, 改回来即可. (两台主机都需要修改, 以免后面再出问题)
pg93@db-172-16-3-33-> cd $PGDATA/pg_xlog pg93@db-172-16-3-33-> vi 0000000B.history 10 1/6000000 no recovery target specified 1 0/92DCDD8 no recovery target specified 2 0/1CB86338 no recovery target specified 3 0/36E68A20 no recovery target specified 4 0/569ADB88 no recovery target specified 5 0/762CF5D8 no recovery target specified 6 0/9F67C920 no recovery target specified 7 0/A0000090 no recovery target specified 8 1/3F535A0 no recovery target specified 9 1/4000090 no recovery target specified
AI 代码解读
重新执行pg_rewind
pg93@db-172-16-3-39-> pg_rewind -D $PGDATA --source-server='host=172.16.3.33 port=1999 dbname=digoal' -v connected to remote server fetched file "global/pg_control", length 8192 fetched file "pg_xlog/0000000B.history", length 419 nen:1,srctli:11,targettli:10 nen:0,srctli:10,targettli:10 Last common WAL position: 1/6000000 on timeline 10 Last common checkpoint at 1/6000000 on timeline 10 error reading xlog record: record with zero length at 1/6000090 .s.PGSQL.1999 (COPY) backup_label.old (COPY) recovery.done (COPY) pg_ident.conf (COPY) postmaster.opts (COPY) postgresql.conf (COPY) pg_hba.conf (COPY) .s.PGSQL.1999.lock (COPY) pg_log/postgresql-2013-08-05_112157.csv (COPY) pg_log/postgresql-2013-08-05_123414.csv (COPY) pg_log/postgresql-2013-08-05_101818.log (COPY) pg_log/postgresql-2013-08-05_134452.log (COPY) pg_log/postgresql-2013-08-05_112001.csv (COPY) pg_log/postgresql-2013-08-05_134452.csv (COPY) pg_log/postgresql-2013-08-05_111642.csv (COPY) pg_log/postgresql-2013-08-05_110518.csv (COPY) pg_log/postgresql-2013-08-05_134655.csv (COPY) pg_log/postgresql-2013-08-05_131517.csv (COPY) pg_log/postgresql-2013-08-05_103139.csv (COPY) pg_log/postgresql-2013-08-05_110518.log (COPY) pg_log/postgresql-2013-08-05_112902.csv (COPY) pg_log/postgresql-2013-08-05_112001.log (COPY) pg_log/postgresql-2013-08-05_134523.csv (COPY) pg_log/postgresql-2013-08-05_134523.log (COPY) pg_log/postgresql-2013-08-05_104358.csv (COPY) pg_log/postgresql-2013-08-05_112902.log (COPY) pg_log/postgresql-2013-08-05_131517.log (COPY) pg_log/postgresql-2013-08-05_130021.csv (COPY) pg_log/postgresql-2013-08-05_104358.log (COPY) pg_log/postgresql-2013-08-05_103139.log (COPY) pg_log/postgresql-2013-08-05_101818.csv (COPY) pg_log/postgresql-2013-08-05_113036.csv (COPY) pg_log/postgresql-2013-08-05_123414.log (COPY) pg_log/postgresql-2013-08-05_123855.csv (COPY) pg_log/postgresql-2013-08-05_112157.log (COPY) pg_log/postgresql-2013-08-05_134655.log (COPY) pg_log/postgresql-2013-08-05_130021.log (COPY) pg_log/postgresql-2013-08-05_113036.log (COPY) pg_log/postgresql-2013-08-05_131316.csv (COPY) pg_log/postgresql-2013-08-05_111642.log (COPY) pg_log/postgresql-2013-08-05_131316.log (COPY) pg_log/postgresql-2013-08-05_123855.log (COPY) pg_log/postgresql-2013-08-05_134444.csv (COPY) pg_log/postgresql-2013-08-05_134444.log (COPY) pg_subtrans/0047 (COPY) pg_notify/0000 (COPY) global/12696_vm (COPY) global/12700_vm (COPY) global/12707_fsm (COPY) global/12557_fsm (COPY) global/pg_internal.init (COPY) global/12700_fsm (COPY) global/pg_control (COPY) global/12711_fsm (COPY) global/12696_fsm (COPY) global/12557_vm (COPY) global/pg_filenode.map (COPY) global/12707_vm (COPY) global/12711_vm (COPY) pg_xlog/00000008.history (COPY) pg_xlog/0000000B0000000100000006 (COPY) pg_xlog/00000009.history (COPY) pg_xlog/0000000A.history (COPY) pg_xlog/0000000B0000000100000005 (COPY) pg_xlog/0000000A0000000100000005 (COPY) pg_xlog/0000000B.history (COPY) pg_clog/0001 (COPY) pg_clog/0000 (COPY) pg_clog/0002 (COPY) pg_clog/0003 (COPY) pg_clog/0004 (COPY) pg_stat_tmp/db_16384.stat (COPY) pg_stat_tmp/global.stat (COPY) pg_stat_tmp/db_0.stat (COPY) pg_multixact/members/0000 (COPY) pg_multixact/offsets/0000 (COPY) base/12814/12641_vm (COPY) base/12814/12639_fsm (COPY) base/12814/12547_fsm (COPY) base/12814/12620_fsm (COPY) ..........省略 base/12809/12625_fsm (COPY) base/12809/12569_fsm (COPY) base/12809/12639_vm (COPY) base/12809/12779_vm (COPY) base/12809/12717_fsm (COPY) base/12809/12799_vm (COPY) base/12809/12600_vm (COPY) base/12809/12612_fsm (COPY) base/12809/12616_fsm (COPY) base/12809/12553_fsm (COPY) base/12809/12608_vm (COPY) base/12809/12784_fsm (COPY) base/12809/12784_vm (COPY) base/12809/12768_vm (COPY) base/12809/12549_vm (COPY) base/12809/12673_fsm (COPY) base/12809/12732_fsm (COPY) base/12809/12794_fsm (COPY) base/12809/12547_vm (COPY) base/12809/12682_vm (COPY) base/12809/12673_vm (COPY) base/12809/12774_vm (COPY) base/12809/12721_fsm (COPY) base/12809/12587_fsm (COPY) base/12809/12608_fsm (COPY) base/12809/12717_vm (COPY) base/12809/12636_fsm (COPY) base/12809/12553_vm (COPY) base/12809/12604_vm (COPY) base/12809/12728_vm (COPY) base/12809/12629_vm (COPY) base/12809/12587_vm (COPY) base/12809/12569_vm (COPY) base/12809/12724_fsm (COPY) base/12809/12768_fsm (COPY) base/12809/12612_vm (COPY) base/12809/12616_vm (COPY) base/12809/12799_fsm (COPY) base/12809/12629_fsm (COPY) base/12809/12732_vm (COPY) base/12809/12641_fsm (COPY) base/12809/12764_fsm (COPY) base/12809/12736_vm (COPY) base/12809/12664_fsm (COPY) base/12809/12658_fsm (COPY) base/12809/12625_vm (COPY) base/12809/12620_vm (COPY) base/12809/12794_vm (COPY) base/12809/pg_filenode.map (COPY) base/12809/12604_fsm (COPY) base/12809/12600_fsm (COPY) base/12809/12774_fsm (COPY) base/12809/12779_fsm (COPY) base/12809/12789_fsm (COPY) base/12809/12576_fsm (COPY) base/12809/12789_vm (COPY) base/12809/12736_fsm (COPY) base/12809/12664_vm (COPY) pg_xlog/archive_status/0000000A0000000100000005.done (COPY) pg_xlog/archive_status/0000000B.history.done (COPY) pg_stat/db_16384.stat (REMOVE) pg_stat/global.stat (REMOVE) pg_stat/db_0.stat (REMOVE) global/pg_stat_statements.stat (REMOVE) pg_xlog/0000000800000000000000F6 (REMOVE) pg_xlog/000000080000000100000003 (REMOVE) pg_xlog/0000000800000000000000F7 (REMOVE) pg_xlog/0000000800000000000000F8 (REMOVE) pg_xlog/0000000A0000000100000006 (REMOVE) pg_xlog/0000000800000000000000F4 (REMOVE) pg_xlog/0000000800000000000000FA (REMOVE) pg_xlog/0000000800000000000000F2 (REMOVE) pg_xlog/000000080000000100000002 (REMOVE) pg_xlog/000000090000000100000003 (REMOVE) pg_xlog/0000000800000000000000EF (REMOVE) pg_xlog/0000000A0000000100000005.00000028.backup (REMOVE) pg_xlog/0000000800000000000000F9 (REMOVE) pg_xlog/0000000800000000000000FD (REMOVE) pg_xlog/0000000A0000000100000004 (REMOVE) pg_xlog/0000000800000000000000FC (REMOVE) pg_xlog/0000000800000000000000EE (REMOVE) pg_xlog/000000080000000100000000 (REMOVE) pg_xlog/0000000800000000000000FB (REMOVE) pg_xlog/0000000800000000000000F0 (REMOVE) pg_xlog/000000090000000100000004 (REMOVE) pg_xlog/000000080000000100000001 (REMOVE) pg_xlog/0000000800000000000000F3 (REMOVE) pg_xlog/archive_status/0000000800000000000000EE.done (REMOVE) pg_xlog/archive_status/0000000800000000000000FF.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F4.done (REMOVE) pg_xlog/archive_status/0000000800000000000000FA.done (REMOVE) pg_xlog/archive_status/0000000800000000000000FD.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F1.done (REMOVE) pg_xlog/archive_status/00000009.history.ready (REMOVE) pg_xlog/archive_status/000000090000000100000004.ready (REMOVE) pg_xlog/archive_status/0000000800000000000000F2.done (REMOVE) pg_xlog/archive_status/0000000800000000000000EF.done (REMOVE) pg_xlog/archive_status/0000000A0000000100000005.ready (REMOVE) pg_xlog/archive_status/0000000A0000000100000004.ready (REMOVE) pg_xlog/archive_status/0000000800000000000000FB.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F3.done (REMOVE) pg_xlog/archive_status/0000000A.history.ready (REMOVE) pg_xlog/archive_status/0000000800000000000000FE.done (REMOVE) pg_xlog/archive_status/000000080000000100000002.done (REMOVE) pg_xlog/archive_status/000000090000000100000003.ready (REMOVE) pg_xlog/archive_status/0000000A0000000100000005.00000028.backup.ready (REMOVE) pg_xlog/archive_status/0000000800000000000000F6.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F9.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F5.done (REMOVE) pg_xlog/archive_status/000000080000000100000000.done (REMOVE) pg_xlog/archive_status/0000000800000000000000FC.done (REMOVE) pg_xlog/archive_status/000000080000000100000003.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F8.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F7.done (REMOVE) pg_xlog/archive_status/000000080000000100000001.done (REMOVE) pg_xlog/archive_status/0000000800000000000000F0.done (REMOVE) pg_xlog/0000000800000000000000F1 (REMOVE) pg_xlog/0000000800000000000000FE (REMOVE) pg_xlog/0000000800000000000000FF (REMOVE) pg_xlog/0000000800000000000000F5 (REMOVE) getting chunks: -- fetch all the blocks listed in the temp table. select path, begin, pg_read_binary_file(path, begin, len) as chunk from fetchchunks sent query received chunk for file "backup_label.old", off 0, len 206 received chunk for file "recovery.done", off 0, len 4759 received chunk for file "pg_ident.conf", off 0, len 1636 received chunk for file "postmaster.opts", off 0, len 32 received chunk for file "postgresql.conf", off 0, len 20431 received chunk for file "pg_hba.conf", off 0, len 4547 received chunk for file ".s.PGSQL.1999.lock", off 0, len 64 received chunk for file "pg_log/postgresql-2013-08-05_112157.csv", off 0, len 48110 received chunk for file "pg_log/postgresql-2013-08-05_123414.csv", off 0, len 10613 received chunk for file "pg_log/postgresql-2013-08-05_112001.csv", off 0, len 8450 received chunk for file "pg_log/postgresql-2013-08-05_134452.csv", off 0, len 968 received chunk for file "pg_log/postgresql-2013-08-05_111642.csv", off 0, len 22888 received chunk for file "pg_log/postgresql-2013-08-05_110518.csv", off 0, len 34844 received chunk for file "pg_log/postgresql-2013-08-05_134655.csv", off 0, len 4932 received chunk for file "pg_log/postgresql-2013-08-05_131517.csv", off 0, len 70200 received chunk for file "pg_log/postgresql-2013-08-05_103139.csv", off 0, len 52611 received chunk for file "pg_log/postgresql-2013-08-05_112902.csv", off 0, len 2009 received chunk for file "pg_log/postgresql-2013-08-05_134523.csv", off 0, len 12060 received chunk for file "pg_log/postgresql-2013-08-05_104358.csv", off 0, len 61220 received chunk for file "pg_log/postgresql-2013-08-05_130021.csv", off 0, len 13541 received chunk for file "pg_log/postgresql-2013-08-05_104358.log", off 0, len 7125 received chunk for file "pg_log/postgresql-2013-08-05_101818.csv", off 0, len 2719 received chunk for file "pg_log/postgresql-2013-08-05_113036.csv", off 0, len 15990 received chunk for file "pg_log/postgresql-2013-08-05_123855.csv", off 0, len 36541 received chunk for file "pg_log/postgresql-2013-08-05_131316.csv", off 0, len 3686 received chunk for file "pg_log/postgresql-2013-08-05_134444.csv", off 0, len 968 received chunk for file "pg_subtrans/0047", off 0, len 114688 received chunk for file "pg_notify/0000", off 0, len 8192 received chunk for file "global/12696_vm", off 0, len 8192 received chunk for file "global/12700_vm", off 0, len 8192 received chunk for file "global/12707_fsm", off 0, len 24576 received chunk for file "global/12557_fsm", off 0, len 24576 received chunk for file "global/pg_internal.init", off 0, len 12784 received chunk for file "global/12700_fsm", off 0, len 24576 received chunk for file "global/pg_control", off 0, len 8192 received chunk for file "global/12711_fsm", off 0, len 24576 received chunk for file "global/12696_fsm", off 0, len 24576 received chunk for file "global/12557_vm", off 0, len 8192 received chunk for file "global/pg_filenode.map", off 0, len 512 received chunk for file "global/12707_vm", off 0, len 8192 received chunk for file "global/12711_vm", off 0, len 8192 received chunk for file "pg_xlog/00000008.history", off 0, len 299 received chunk for file "pg_xlog/0000000B0000000100000006", off 0, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 1000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 2000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 3000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 4000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 5000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 6000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 7000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 8000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 9000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 10000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 11000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 12000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 13000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 14000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 15000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000006", off 16000000, len 777216 received chunk for file "pg_xlog/00000009.history", off 0, len 334 received chunk for file "pg_xlog/0000000A.history", off 0, len 376 received chunk for file "pg_xlog/0000000B0000000100000005", off 0, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 1000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 2000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 3000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 4000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 5000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 6000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 7000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 8000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 9000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 10000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 11000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 12000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 13000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 14000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 15000000, len 1000000 received chunk for file "pg_xlog/0000000B0000000100000005", off 16000000, len 777216 received chunk for file "pg_xlog/0000000A0000000100000005", off 0, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 1000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 2000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 3000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 4000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 5000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 6000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 7000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 8000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 9000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 10000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 11000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 12000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 13000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 14000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 15000000, len 1000000 received chunk for file "pg_xlog/0000000A0000000100000005", off 16000000, len 777216 received chunk for file "pg_xlog/0000000B.history", off 0, len 419 received chunk for file "pg_clog/0001", off 0, len 262144 received chunk for file "pg_clog/0000", off 0, len 262144 received chunk for file "pg_clog/0002", off 0, len 262144 received chunk for file "pg_clog/0003", off 0, len 262144 received chunk for file "pg_clog/0004", off 0, len 122880 received chunk for file "pg_stat_tmp/db_16384.stat", off 0, len 3216 received chunk for file "pg_stat_tmp/global.stat", off 0, len 471 received chunk for file "pg_stat_tmp/db_0.stat", off 0, len 1188 received chunk for file "pg_multixact/members/0000", off 0, len 8192 received chunk for file "pg_multixact/offsets/0000", off 0, len 8192 received chunk for file "base/12814/12641_vm", off 0, len 8192 received chunk for file "base/12814/12639_fsm", off 0, len 24576 received chunk for file "base/12814/12547_fsm", off 0, len 24576 received chunk for file "base/12814/12620_fsm", off 0, len 24576 received chunk for file "base/12814/12549_fsm", off 0, len 24576 ................省略 received chunk for file "base/12809/12576_fsm", off 0, len 24576 received chunk for file "base/12809/12789_vm", off 0, len 8192 received chunk for file "base/12809/12736_fsm", off 0, len 24576 received chunk for file "base/12809/12664_vm", off 0, len 8192 Done!
AI 代码解读
启动数据库, 无法启动 :
2013-08-05 13:52:04.424 CST,,,29312,,51ff3d84.7280,1,,2013-08-05 13:52:04 CST,,0,LOG,00000,"database system was interrupted; last known up at 2013-08-05 13:47:00 CST",,,,,,,,"StartupXLOG, xlog.c:4915","" 2013-08-05 13:52:04.424 CST,,,29312,,51ff3d84.7280,2,,2013-08-05 13:52:04 CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:4968","" 2013-08-05 13:52:04.424 CST,,,29312,,51ff3d84.7280,3,,2013-08-05 13:52:04 CST,,0,LOG,00000,"invalid checkpoint link in backup_label file",,,,,,,,"ReadCheckpointRecord, xlog.c:6364","" 2013-08-05 13:52:04.424 CST,,,29312,,51ff3d84.7280,4,,2013-08-05 13:52:04 CST,,0,FATAL,XX000,"could not locate required checkpoint record",,"If you are not restoring from a backup, try removing the file ""/pgdata/digoal/1921/data03/pg93/pg_root/backup_label"".",,,,,,"StartupXLOG, xlog.c:5047","" 2013-08-05 13:52:04.425 CST,,,29310,,51ff3d84.727e,1,,2013-08-05 13:52:04 CST,,0,LOG,00000,"startup process (PID 29312) exited with exit code 1",,,,,,,,"LogChildExit, postmaster.c:3211","" 2013-08-05 13:52:04.425 CST,,,29310,,51ff3d84.727e,2,,2013-08-05 13:52:04 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,"reaper, postmaster.c:2536",""
AI 代码解读
这个错误, 需要删除新的standby(老的primary)的backup_label文件.
rm -f $PGDATA/backup_label*
AI 代码解读
启动数据库, 无法启动 :
pg93@db-172-16-3-39-> cat postgresql-2013-08-05_135236.csv 2013-08-05 13:52:36.846 CST,,,29333,,51ff3da4.7295,1,,2013-08-05 13:52:36 CST,,0,LOG,00000,"database system was interrupted; last known up at 2013-08-05 13:47:00 CST",,,,,,,,"StartupXLOG, xlog.c:4915","" 2013-08-05 13:52:36.846 CST,,,29333,,51ff3da4.7295,2,,2013-08-05 13:52:36 CST,,0,LOG,00000,"entering standby mode",,,,,,,,"StartupXLOG, xlog.c:4968","" 2013-08-05 13:52:36.846 CST,,,29333,,51ff3da4.7295,3,,2013-08-05 13:52:36 CST,,0,FATAL,XX000,"invalid data in history file: 1 0/92DCDD8 no recovery target specified ",,"Timeline IDs must be in increasing sequence.",,,,,,"readTimeLineHistory, timeline.c:158","" 2013-08-05 13:52:36.846 CST,,,29331,,51ff3da4.7293,1,,2013-08-05 13:52:36 CST,,0,LOG,00000,"startup process (PID 29333) exited with exit code 1",,,,,,,,"LogChildExit, postmaster.c:3211","" 2013-08-05 13:52:36.846 CST,,,29331,,51ff3da4.7293,2,,2013-08-05 13:52:36 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,"reaper, postmaster.c:2536",""
AI 代码解读
因为前面修改了history文件, 改回来即可. (两台主机都需要修改, 以免后面再出问题)
pg93@db-172-16-3-33-> vi 0000000B.history pg93@db-172-16-3-39-> vi 0000000B.history 1 0/92DCDD8 no recovery target specified 2 0/1CB86338 no recovery target specified 3 0/36E68A20 no recovery target specified 4 0/569ADB88 no recovery target specified 5 0/762CF5D8 no recovery target specified 6 0/9F67C920 no recovery target specified 7 0/A0000090 no recovery target specified 8 1/3F535A0 no recovery target specified 9 1/4000090 no recovery target specified 10 1/6000000 no recovery target specified
AI 代码解读
启动数据库, 复制正常.
[其他问题]
1. 还有可能遇到值溢出的问题.
pg_xlog/0000000B000000010000001E (REMOVE) unexpected result while sending file list: ERROR: value "2148254528" is out of range for type integer CONTEXT: COPY fetchchunks, line 28557, column begin: "2148254528"
AI 代码解读
来自以下函数 :
pg_rewind_master/libpq_fetch.c
[参考]
/* * Fetch all changed blocks from remote source data directory. */ void libpq_executeFileMap(filemap_t *map) { file_entry_t *entry; const char *sql; PGresult *res; /* * First create a temporary table, and load it with the blocks that * we need to fetch. */ sql = "create temporary table fetchchunks(path text, begin int4, len int4);"; res = PQexec(conn, sql); if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "error creating temporary table: %s\n", PQresultErrorMessage(res)); exit(1); } sql = "copy fetchchunks from stdin"; res = PQexec(conn, sql); if (PQresultStatus(res) != PGRES_COPY_IN) { fprintf(stderr, "unexpected result while sending file list: %s\n", PQresultErrorMessage(res)); exit(1); }
AI 代码解读
[参考]