PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

作者

digoal

日期

2016-10-02

标签

PostgreSQL , 9.6 , vacuum freeze , visibility map , skip frozen page


背景

PostgreSQL的tuple(即记录)头信息中有两个字段分别为XMIN,XMAX用于标记行产生与变更的事务号,以标示记录的版本号,事务的可见性等。

这个事务号是32BIT的长度,因此PG设计了一个事务存活的最长时间是约20亿,如果超过20亿必须将这个事务置为frozen。

被置为frozen状态的记录,对所有的事务可见,从而解决了32BIT的XID可以用于实现MVCC的目的。

因此PostgreSQL 需要周期性的对表进行扫描,检查是否需要将记录置为frozen。

PostgreSQL 9.4以前的版本,FROZEN是通过一个等于2的XID来表示的。

从9.4开始改成了通过tuple 头部的t_infomask中的两个互斥的比特位来表示HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID。

参见
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

有些插入操作,也可以直接将记录置为freeze,例如大批量的COPY数据。 insert into等。

src/include/access/heapam.h

/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL    0x0001
#define HEAP_INSERT_SKIP_FSM    0x0002
#define HEAP_INSERT_FROZEN              0x0004
#define HEAP_INSERT_SPECULATIVE 0x0008

表的全局年龄则记录在pg_class中,即使表没有任何变化,在年龄到达一定的值(参数配置)后,也需要发起frozen的动作,对表的记录进行扫描。

9.6对这块做了改进,当数据页中的所有记录已经是FROZEN状态时,在发起vacuum freeze时会跳过这个页的扫描,从而大幅提升静态数据的freeze操作,减少IO扫描。

目前这个page frozen标记放在表对应的VM文件中。

vacuum freeze改进代码分析

1. 如何将记录标记为freeze状态

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

Change the way we mark tuples as frozen.

Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund

src/include/access/htup_details.h

/*
 * information stored in t_infomask:
 */
#define HEAP_XMIN_FROZEN                (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

2. 9.6在VM文件中,新增了对FROZEN页的标记,当整页数据都是静态的时,并且都已经是freeze状态,则该页标记为FROZEN,在执行vacuum freeze或自动触发freeze时,就会跳过这个页的扫描。

src/backend/access/heap/visibilitymap.c

 * The visibility map is a bitmap with two bits (all-visible and all-frozen)
 * per heap page. A set all-visible bit means that all tuples on the page are
 * known visible to all transactions, and therefore the page doesn't need to
 * be vacuumed. A set all-frozen bit means that all tuples on the page are
 * completely frozen, and therefore the page doesn't need to be vacuumed even
 * if whole table scanning vacuum is required (e.g. anti-wraparound vacuum).
 * The all-frozen bit must be set only when the page is already all-visible.

src/include/access/visibilitymap.h

/* Number of bits for one heap page */
#define BITS_PER_HEAPBLOCK 2

/* Flags for bit map */
#define VISIBILITYMAP_ALL_VISIBLE       0x01
#define VISIBILITYMAP_ALL_FROZEN        0x02
#define VISIBILITYMAP_VALID_BITS        0x03            /* OR of all valid
                                                                                                 * visiblitymap flags bits */

/* Macros for visibilitymap test */
#define VM_ALL_VISIBLE(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
#define VM_ALL_FROZEN(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)

3. 如果vm页损坏了,我们可以通过vacuum DISABLE_PAGE_SKIPPING强制扫描所有的页。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ede62e56fbe809baa1a7bc3873d82f12ffe7540b

If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

postgres=# \h vacuum
Command:     VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]

9.6 vs 9.5 FREEZE操作

测试用例

postgres=# create table test(id int, info text);
CREATE TABLE

插入6400万记录

$ vi test.sql
insert into test values (1,'test');

$ pgbench -M prepared -n -r -f ./test.sql -c 64 -j 64 -t 1000000


transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
number of transactions per client: 1000000
number of transactions actually processed: 64000000/64000000
latency average = 0.164 ms
tps = 389383.803477 (including connections establishing)
tps = 389393.063237 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.162  insert into test values (1,'test');

1. 9.6测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 8629.180 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir 
-------+------------------------+------------------------+------------------------+----------+-------
 90112 | 2016-10-02 13:34:24+08 | 2016-10-02 13:34:32+08 | 2016-10-02 13:34:32+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/87E3C58
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 3487.945 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C3F30','3/87E3C58'));
 pg_size_pretty 
----------------
 604 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.6接下来的VACUUM FREEZE非常快,已经自动跳过了frozen page,并且不会产生XLOG。

postgres=# vacuum freeze test;
VACUUM
Time: 16.581 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.079 ms

postgres=# vacuum freeze test;
VACUUM
Time: 16.555 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.115 ms

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 16.566 ms

消耗几个事务后,重新vacuum freeze依旧很快,产生非常少量的xlog(这些xlog实际上是变更test表的pg_class.relfrozenxid字段产生的)。

postgres=# select txid_current();
 txid_current 
--------------
     64001925
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
     64001926
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
     64001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 18.020 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C4130
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C4130', '3/2E3C3F30'));
 pg_size_pretty 
----------------
 512 bytes
(1 row)

使用强制vacuum freeze,不跳过froze pages.
当VM文件损坏时,可以这样使用。

postgres=# vacuum (freeze, verbose, DISABLE_PAGE_SKIPPING) test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 345957 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
2 pages are entirely empty.
CPU 0.68s/3.70u sec elapsed 4.38 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 4397.821 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C42F0
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C42F0','3/2E3C4130'));
 pg_size_pretty 
----------------
 448 bytes
(1 row)

通过测试,我们可以看到PostgreSQL 9.6在vacuum freeze这块改进非常实用,特别是当数据库很庞大,并且存储了大量的历史静态数据时,在发生vacuum freeze时,IO不会像以前那样飙升了。

2. 9.5测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 17441.652 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir 
-------+------------------------+------------------------+------------------------+----------+-------
 49152 | 2016-10-02 14:11:17+08 | 2016-10-02 14:11:34+08 | 2016-10-02 14:11:34+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/A7FD5AC8
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 9889.702 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DAB8', '5/A7FD5AC8'));
 pg_size_pretty 
----------------
 587 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.5的版本,接下来的VACUUM FREEZE同样需要扫描全表,因为VM中未记录frozen pages。

postgres=# vacuum freeze test;
VACUUM
Time: 7191.695 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze test;
VACUUM
Time: 7159.769 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7206.649 ms

消耗几个事务后,重新vacuum freeze。

postgres=# select txid_current();
 txid_current 
--------------
    128001924
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001925
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001926
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7208.527 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DEA8
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DEA8', '5/CCB2DAB8'));
 pg_size_pretty 
----------------
 1008 bytes
(1 row)

小结

PostgreSQL 9.6对VM文件的功能进行了扩展,实用2个比特位标记该页的记录数是否是clean的,对所有事务可见;以及标记该页的所有记录数是否都是FREEZE的。

在进行freeze操作时,或者触发autovacuum freeze prevent wrapped object时,会自动跳过标记为frozen的page,从而大幅减少vacuum freeze的IO。

对于数据库中有大量静态数据,并且又有高并发的写事务并行存在时,特别实用。

同时PostgreSQL 9.6还提供了强制vacuum freeze的接口,不跳过任何页。 保留这样的接口,目的是在vm文件损坏时可用,VM文件也可以通过这种方法自动修复。

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
25天前
|
Java 关系型数据库 数据库连接
Mybatis+MySQL动态分页查询数据经典案例(含代码以及测试)
Mybatis+MySQL动态分页查询数据经典案例(含代码以及测试)
24 1
|
6月前
|
SQL 前端开发 Java
Java+Mysql图书管理系统(完整实训代码)
​ ✨博主:命运之光 🌸专栏:Python星辰秘典 🐳专栏:web开发(html css js) ❤️专栏:Java经典程序设计 ☀️博主的其他文章:点击进入博主的主页
256 0
|
6月前
|
SQL Java 关系型数据库
Java的JDBC编程<-----用Java代码操作数据库(MySQL数据库)
Java的JDBC编程<-----用Java代码操作数据库(MySQL数据库)
24 0
|
6月前
|
数据挖掘 Java 测试技术
无代码动态表单系统 毕业设计 JAVA+Vue+SpringBoot+MySQL(一)
无代码动态表单系统 毕业设计 JAVA+Vue+SpringBoot+MySQL
|
2月前
|
SQL 关系型数据库 MySQL
MySQL日期函数的SQL代码示例和使用场景
MySQL日期函数的SQL代码示例和使用场景
24 0
|
3月前
|
SQL 存储 关系型数据库
MySQL索引的使用,大大提升你代码的效率
MySQL索引的使用,大大提升你代码的效率
|
4月前
|
SQL Java 关系型数据库
【Spring Boot+Thymeleaf+MyBatis+mysql】实现电子商务平台实战(附源码)持续更新~~ 包括sql语句、java、html代码
【Spring Boot+Thymeleaf+MyBatis+mysql】实现电子商务平台实战(附源码)持续更新~~ 包括sql语句、java、html代码
48 0
|
9月前
|
JSON 前端开发 关系型数据库
MySQL实战基础知识入门(9):MYSQL跨4个表的高效查询代码的解决方案
MySQL实战基础知识入门(9):MYSQL跨4个表的高效查询代码的解决方案
88 0
|
9月前
|
JSON 前端开发 JavaScript
Echarts实战案例代码(15):月收入年龄分段等MYSQL分类统计PHP后台数据管理接口API数据的解决方案
Echarts实战案例代码(15):月收入年龄分段等MYSQL分类统计PHP后台数据管理接口API数据的解决方案
129 0
|
9月前
|
关系型数据库 PostgreSQL
PostgreSQL vacuum可见性
PostgreSQL vacuum可见性
61 0

相关产品

  • 云原生数据库 PolarDB