PostgreSQL 最佳实践 - 任意时间点恢复源码分析

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:

背景

我们知道PostgreSQL是支持任意时间点恢复的,那么背后的原理是什么?

本文将对PG的时间点恢复进行详细的讲解,帮助用户理解。

本文涉及源码参考PostgreSQL 9.2.2版本.

时间点恢复涉及的参数

我们知道PostgreSQL 支持PITR, 基于时间点的恢复. 通过配置recovery.conf可以指定3种恢复目标, 如下 :

recovery_target_name (string)  
This parameter specifies the named restore point, created with pg_create_restore_point() to which recovery will proceed.   
At most one of recovery_target_name, recovery_target_time or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.  
  
recovery_target_time (timestamp)  
This parameter specifies the time stamp up to which recovery will proceed.   
At most one of recovery_target_time, recovery_target_name or recovery_target_xid can be specified. The default is to recover to the end of the WAL log.   
The precise stopping point is also influenced by recovery_target_inclusive.  
  
recovery_target_xid (string)  
This parameter specifies the transaction ID up to which recovery will proceed. Keep in mind that while transaction IDs are assigned sequentially at transaction start, transactions can complete in a different numeric order. The transactions that will be recovered are those that committed before (and optionally including) the specified one.   
At most one of recovery_target_xid, recovery_target_name or recovery_target_time can be specified. The default is to recover to the end of the WAL log.   
The precise stopping point is also influenced by recovery_target_inclusive.  

其中recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数, 如下 :

recovery_target_inclusive (boolean)  
Specifies whether we stop just after the specified recovery target (true), or just before the recovery target (false).   
Applies to both recovery_target_time and recovery_target_xid, whichever one is specified for this recovery.   
This indicates whether transactions having exactly the target commit time or ID, respectively, will be included in the recovery.   
Default is true.  

默认为true取自src/backend/access/transam/xlog.c :

static bool recoveryTargetInclusive = true;  

为什么recovery_target_name不能指定recovery_target_inclusive参数?

而recovery_target_time和recovery_target_xid可以指定recovery_target_inclusive参数呢?

恢复截至点源码分析

首先要解释一下, 什么情况下恢复可以截止.

只在三种情况恢复可以截止 :

 COMMIT/ABORT/XLOG_RESTORE_POINT, 

然后这些信息从哪里来呢? 它们都取自XLOG的头数据XLogRecord中的sl_rmid和xl_info :

src/include/access/xlog.h

/*  
 * The overall layout of an XLOG record is:  
 *              Fixed-size header (XLogRecord struct)  
 *              rmgr-specific data  
 *              BkpBlock  
 *              backup block data  
 *              BkpBlock  
 *              backup block data  
 *              ...  
 *  
 * where there can be zero to four backup blocks (as signaled by xl_info flag  
 * bits).  XLogRecord structs always start on MAXALIGN boundaries in the WAL  
 * files, and we round up SizeOfXLogRecord so that the rmgr data is also  
 * guaranteed to begin on a MAXALIGN boundary.  However, no padding is added  
 * to align BkpBlock structs or backup block data.  
 *  
 * NOTE: xl_len counts only the rmgr data, not the XLogRecord header,  
 * and also not any backup blocks.      xl_tot_len counts everything.  Neither  
 * length field is rounded up to an alignment boundary.  
 */  
typedef struct XLogRecord  
{  
        pg_crc32        xl_crc;                 /* CRC for this record */  
        XLogRecPtr      xl_prev;                /* ptr to previous record in log */  
        TransactionId xl_xid;           /* xact id */  
        uint32          xl_tot_len;             /* total len of entire record */  
        uint32          xl_len;                 /* total len of rmgr data */  
        uint8           xl_info;                /* flag bits, see below */  
        RmgrId          xl_rmid;                /* resource manager for this record */  
  
        /* Depending on MAXALIGN, there are either 2 or 6 wasted bytes here */  
  
        /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */  
  
} XLogRecord;  

只有在这三个状态下, 恢复允许进入截止判断.

COMMIT/ABORT/XLOG_RESTORE_POINT;

这个逻辑来自recoveryStopsHere函数 :

恢复截止的处理函数recoveryStopsHere中包含了这三个状态的判断, 如下 :

src/backend/access/transam/xlog.c

        /* We only consider stopping at COMMIT, ABORT or RESTORE POINT records */  
        if (record->xl_rmid != RM_XACT_ID && record->xl_rmid != RM_XLOG_ID)  
                return false;  
        record_info = record->xl_info & ~XLR_INFO_MASK;  
        if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT_COMPACT)  
        {  
                xl_xact_commit_compact *recordXactCommitData;  
  
                recordXactCommitData = (xl_xact_commit_compact *) XLogRecGetData(record);  
                recordXtime = recordXactCommitData->xact_time;  
        }  
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_COMMIT)  
        {  
                xl_xact_commit *recordXactCommitData;  
  
                recordXactCommitData = (xl_xact_commit *) XLogRecGetData(record);  
                recordXtime = recordXactCommitData->xact_time;  
        }  
        else if (record->xl_rmid == RM_XACT_ID && record_info == XLOG_XACT_ABORT)  
        {  
                xl_xact_abort *recordXactAbortData;  
  
                recordXactAbortData = (xl_xact_abort *) XLogRecGetData(record);  
                recordXtime = recordXactAbortData->xact_time;  
        }  
        else if (record->xl_rmid == RM_XLOG_ID && record_info == XLOG_RESTORE_POINT)  
        {  
                xl_restore_point *recordRestorePointData;  
  
                recordRestorePointData = (xl_restore_point *) XLogRecGetData(record);  
                recordXtime = recordRestorePointData->rp_time;  
                strncpy(recordRPName, recordRestorePointData->rp_name, MAXFNAMELEN);  
        }  
        else  
                return false;  

COMMIT和ABORT很好理解, 就是事务结束时状态, RESOTRE POINT的信息则来自XLogRestorePoint函数,

src/backend/access/transam/xlog.c

/*  
 * Write a RESTORE POINT record  
 */  
XLogRecPtr  
XLogRestorePoint(const char *rpName)  
{  
        XLogRecPtr      RecPtr;  
        XLogRecData rdata;  
        xl_restore_point xlrec;  
  
        xlrec.rp_time = GetCurrentTimestamp();  
        strncpy(xlrec.rp_name, rpName, MAXFNAMELEN);  
  
        rdata.buffer = InvalidBuffer;  
        rdata.data = (char *) &xlrec;  
        rdata.len = sizeof(xl_restore_point);  
        rdata.next = NULL;  
  
        RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT, &rdata);  
  
        ereport(LOG,  
                        (errmsg("restore point \"%s\" created at %X/%X",  
                                        rpName, RecPtr.xlogid, RecPtr.xrecoff)));  
  
        return RecPtr;  
}  

什么是自定义还原点

在使用PostgreSQL内建的pg_create_restore_point函数创建还原点时用到XLogRestorePoint :

src/backend/access/transam/xlogfuncs.c

/*  
 * pg_create_restore_point: a named point for restore  
 */  
Datum  
pg_create_restore_point(PG_FUNCTION_ARGS)  
{  
        text       *restore_name = PG_GETARG_TEXT_P(0);  
        char       *restore_name_str;  
        XLogRecPtr      restorepoint;  
        char            location[MAXFNAMELEN];  
  
        if (!superuser())  
                ereport(ERROR,  
                                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),  
                                 (errmsg("must be superuser to create a restore point"))));  
  
        if (RecoveryInProgress())  
                ereport(ERROR,  
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
                                 (errmsg("recovery is in progress"),  
                                  errhint("WAL control functions cannot be executed during recovery."))));  
  
        if (!XLogIsNeeded())  
                ereport(ERROR,  
                                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),  
                         errmsg("WAL level not sufficient for creating a restore point"),  
                                 errhint("wal_level must be set to \"archive\" or \"hot_standby\" at server start.")));  
  
        restore_name_str = text_to_cstring(restore_name);  
  
        if (strlen(restore_name_str) >= MAXFNAMELEN)  
                ereport(ERROR,  
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),  
                                 errmsg("value too long for restore point (maximum %d characters)", MAXFNAMELEN - 1)));  
  
        restorepoint = XLogRestorePoint(restore_name_str);  
  
        /*  
         * As a convenience, return the WAL location of the restore point record  
         */  
        snprintf(location, sizeof(location), "%X/%X",  
                         restorepoint.xlogid, restorepoint.xrecoff);  
        PG_RETURN_TEXT_P(cstring_to_text(location));  
}  

经过以上介绍以后, 我们知道recoveryStopsHere开头部分的逻辑决定了PITR恢复可以选择截止在:

1. 事务结束时(COMMIT/ABORT);

2. 或者是用户使用pg_create_restore_point创建的还原点;

recoveryStopsHere接下来的部分针对recovery.conf中的配置, 判断是否截止恢复.

截至点的用法解说

在文章开头我们还提到了3个还原目标(target) :

(recovery_target_xid, recovery_target_time, recovery_target_name)

1. 未设置任何截至目标, 只返回false, 所以不会停止

        /* Do we have a PITR target at all? */  
        if (recoveryTarget == RECOVERY_TARGET_UNSET)  
        {  
                /*  
                 * Save timestamp of latest transaction commit/abort if this is a  
                 * transaction record  
                 */  
                if (record->xl_rmid == RM_XACT_ID)  
                        SetLatestXTime(recordXtime);  
                return false;  
        }  

RECOVERY_TARGET_UNSET 取自

src/include/access/xlog.h

/*  
 * Recovery target type.  
 * Only set during a Point in Time recovery, not when standby_mode = on  
 */  
typedef enum  
{  
        RECOVERY_TARGET_UNSET,  
        RECOVERY_TARGET_XID,  
        RECOVERY_TARGET_TIME,  
        RECOVERY_TARGET_NAME  
} RecoveryTargetType;  

2. recovery_target_xid 与 XLogRecord->xl_xid进行比较.
xid作为恢复目标时, recoveryTargetInclusive只影响日志输出(recoveryStopAfter).

原因是xid是按事务启动顺序分配的, 而不是按事务结束顺序分配. 并且这种target下面截止只可能在COMMIT/ABORT.

所以只要达到这个xid并且状态是commit/abort时, 就返回true.

*includeThis = recoveryTargetInclusive;只影响了日志输出. 而不是包含和不包含的意思.

        if (recoveryTarget == RECOVERY_TARGET_XID)  
        {  
                /*  
                 * There can be only one transaction end record with this exact  
                 * transactionid  
                 *  
                 * when testing for an xid, we MUST test for equality only, since  
                 * transactions are numbered in the order they start, not the order  
                 * they complete. A higher numbered xid will complete before you about  
                 * 50% of the time...  
                 */  
                stopsHere = (record->xl_xid == recoveryTargetXid);  
                if (stopsHere)  
                        *includeThis = recoveryTargetInclusive;  
        }  

日志输出时, 判断recoveryStopAfter :

        if (stopsHere)  
        {  
                recoveryStopXid = record->xl_xid;  
                recoveryStopTime = recordXtime;  
                recoveryStopAfter = *includeThis;  
  
                if (record_info == XLOG_XACT_COMMIT_COMPACT || record_info == XLOG_XACT_COMMIT)  
                {  
                        if (recoveryStopAfter)  
                                ereport(LOG,  
                                                (errmsg("recovery stopping after commit of transaction %u, time %s",  
                                                                recoveryStopXid,  
                                                                timestamptz_to_str(recoveryStopTime))));  
                        else  
                                ereport(LOG,  
                                                (errmsg("recovery stopping before commit of transaction %u, time %s",  
                                                                recoveryStopXid,  
                                                                timestamptz_to_str(recoveryStopTime))));  
                }  
                else if (record_info == XLOG_XACT_ABORT)  
                {  
                        if (recoveryStopAfter)  
                                ereport(LOG,  
                                                (errmsg("recovery stopping after abort of transaction %u, time %s",  
                                                                recoveryStopXid,  
                                                                timestamptz_to_str(recoveryStopTime))));  
                        else  
                                ereport(LOG,  
                                                (errmsg("recovery stopping before abort of transaction %u, time %s",  
                                                                recoveryStopXid,  
                                                                timestamptz_to_str(recoveryStopTime))));  
                }  

3. recovery_target_name 与 XLogRecData->data进行比较.

如果数据库中有多个重复命名的还原点, 遇到第一个则停止.

同时因为还原点的信息写在单独的xlog数据块中, 不是一条transaction record块, 所以也没有包含或不包含的概念, 直接截止.

不需要判断recovery_target_inclusive .

        else if (recoveryTarget == RECOVERY_TARGET_NAME)  
        {  
                /*  
                 * There can be many restore points that share the same name, so we  
                 * stop at the first one  
                 */  
                stopsHere = (strcmp(recordRPName, recoveryTargetName) == 0);  
  
                /*  
                 * Ignore recoveryTargetInclusive because this is not a transaction  
                 * record  
                 */  
                *includeThis = false;  
        }  

4. recovery_target_time 与 xl_xact_commit_compact->xact_time进行比较.

因为在同一个时间点, 可能有多个事务COMMIT/ABORT. 所以recovery_target_inclusive 在这里起到的作用是 :

截止于这个时间点的第一个提交的事务后(包含这个时间点第一个遇到的提交/回滚的事务);

或者截止于这个时间点提交的最后一个事务后(包括这个时间点提交/回滚的所有事务).

        else  
        {  
                /*  
                 * There can be many transactions that share the same commit time, so  
                 * we stop after the last one, if we are inclusive, or stop at the  
                 * first one if we are exclusive  
                 */  
                if (recoveryTargetInclusive)  
                        stopsHere = (recordXtime > recoveryTargetTime);  
                else  
                        stopsHere = (recordXtime >= recoveryTargetTime);  
                if (stopsHere)  
                        *includeThis = false;  
        }  

其中事务结束时间来自这个数据结构 :

src/include/access/xact.h

typedef struct xl_xact_commit_compact  
{  
        TimestampTz xact_time;          /* time of commit */  
        int                     nsubxacts;              /* number of subtransaction XIDs */  
        /* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */  
        TransactionId subxacts[1];      /* VARIABLE LENGTH ARRAY */  
} xl_xact_commit_compact;  

从以上逻辑看到, recoveryTargetInclusive只有当恢复目标是xid或者time时可以指定.

目标是target name时不需要指定.

参考

1. src/include/catalog/pg_control.h

/* XLOG info values for XLOG rmgr */  
#define XLOG_CHECKPOINT_SHUTDOWN                0x00  
#define XLOG_CHECKPOINT_ONLINE                  0x10  
#define XLOG_NOOP                                               0x20  
#define XLOG_NEXTOID                                    0x30  
#define XLOG_SWITCH                                             0x40  
#define XLOG_BACKUP_END                                 0x50  
#define XLOG_PARAMETER_CHANGE                   0x60  
#define XLOG_RESTORE_POINT                              0x70  
#define XLOG_FPW_CHANGE                         0x80  

2. src/include/access/xlog.h

/*  
 * XLOG uses only low 4 bits of xl_info.  High 4 bits may be used by rmgr.  
 */  
#define XLR_INFO_MASK                   0x0F  

3. src/include/access/rmgr.h

/*  
 * Built-in resource managers  
 *  
 * Note: RM_MAX_ID could be as much as 255 without breaking the XLOG file  
 * format, but we keep it small to minimize the size of RmgrTable[].  
 */  
#define RM_XLOG_ID                              0  
#define RM_XACT_ID                              1  
#define RM_SMGR_ID                              2  
#define RM_CLOG_ID                              3  
#define RM_DBASE_ID                             4  
#define RM_TBLSPC_ID                    5  
#define RM_MULTIXACT_ID                 6  
#define RM_RELMAP_ID                    7  
#define RM_STANDBY_ID                   8  
#define RM_HEAP2_ID                             9  
#define RM_HEAP_ID                              10  
#define RM_BTREE_ID                             11  
#define RM_HASH_ID                              12  
#define RM_GIN_ID                               13  
#define RM_GIST_ID                              14  
#define RM_SEQ_ID                               15  
#define RM_SPGIST_ID                    16  
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
3月前
|
SQL canal 算法
PolarDB-X最佳实践:如何设计一张订单表
本文主要内容是如何使用全局索引与CO_HASH分区算法(CO_HASH),实现高效的多维度查询。
|
3月前
|
关系型数据库 分布式数据库 数据处理
报名预约|PolarDB产品易用性创新与最佳实践在线直播
在线体验PolarDB产品易用性创新,练习阿里云数据库NL2SQL、无感切换实操技能,探索数据处理提速与学习成本降低实践
|
4月前
|
关系型数据库 MySQL 分布式数据库
PolarDB-X最佳实践系列(三):如何实现高效的分页查询
分页查询是数据库中常见的操作。本文将介绍,如何在数据库中(无论是单机还是分布式)高效的进行翻页操作。
112406 10
|
6月前
|
安全 关系型数据库 数据库
《确保安全:PostgreSQL安全配置与最佳实践》
《确保安全:PostgreSQL安全配置与最佳实践》
233 0
|
6月前
|
存储 关系型数据库 Go
《PostgreSQL备份与恢复:步骤与最佳实践》
《PostgreSQL备份与恢复:步骤与最佳实践》
350 0
|
8月前
|
Cloud Native 关系型数据库 分布式数据库
客户说|PolarDB最佳实践:工期缩短2/3,揭秘极氪APP分布式改造效率神器
极氪APP引入阿里云PolarDB,21天完成数据库分布式改造
|
8月前
|
存储 关系型数据库 PostgreSQL
Postgresql内核源码分析-heapam分析
Postgresql内核源码分析-heapam分析
113 1
|
存储 SQL JSON
PolarDB MySQL 5.6/MySQL 5.6升级PolarDB MySQL 8.0最佳实践
升级概述为什么选择升级到PolarDB MySQL 8.0?PolarDB MySQL 8.0.1 (基于官方MySQL 8.0.13内核版本)发布于2019-12-03和PolarDB MySQL 8.0.2(基于官方MySQL 8.0.18内核版本)发布于2020-07-22*,增强了诸多卓越的架构增强和内核能力,为业务提供更灵活的技术解决方案和强大收益的性能提升,主要包括:Serverles
466 0
|
存储 SQL JSON
PolarDB MySQL 5.7/RDS 5.7升级到PolarDB MySQL 8.0最佳实践
升级概述PolarDB MySQL 5.7/RDS 5.7 向 8.0 升级过程中,经常遇到的问题主要是性能问题、语法兼容性问题,以及周边组件是否的支持,查询的性能问题一般是由于优化器升级导致执 行计划有变,此类问题需要对性能低下的语句进行针对性的性能优化,但性能问题基本不会引发业务报错以及代码的改写问题,此类问题不在本文讨论范围之内。本文主要讨论真实的兼容性问题,此类问题需要在数据库升级过程中,
924 0