开源数据库PostgreSQL攻克并行计算难题

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介: PostgreSQL 9.6的并行复制一发,相信已经有很多小伙伴已经开始测试了,我昨晚测试了一个场景是标签系统类应用的比特位运算,昨天测试发现性能相比非并行已经提升了7倍。昨天没有仔细研究代码,发现怎么测都只能用8个并行,今天看了一下代码,终于找到端倪了,其实并行度是由几个方面决定d , 决定并行.

经过多年的酝酿(从支持work process到支持动态fork共享内存,再到内核层面支持并行计算),PostgreSQL 的并行计算功能终于来了,为PG的scale up能力再次拔高一个台阶,标志着开源数据库已经攻克了并行计算的难题。


相信有很多小伙伴已经开始测试了,我也测试了一个场景是标签系统类应用的比特位运算,昨天测试发现性能相比非并行已经提升了7倍。

调整并行度,在32个核的虚拟机上测试,性能提升了约10多倍。
但是实际上并没有到32倍,不考虑内存和IO的瓶颈,是有优化空间。
注意不同的并行度,效果不一样,目前来看并不是最大并行度就能发挥最好的性能,还需要考虑锁竞争的问题。
把测试表的数据量加载到16亿,共90GB。

postgres=# \dt+
                    List of relations
 Schema |  Name  | Type  |  Owner   | Size  | Description 
--------+--------+-------+----------+-------+-------------
 public | t_bit2 | table | postgres | 90 GB | 
(1 row)

不使用并行的性能如下,耗时 141377.100 毫秒。

postgres=# alter table t_bit2 set (parallel_degree=0);
ALTER TABLE
Time: 0.335 ms
postgres=# select count(*) from t_bit2 ;
   count    
------------
 1600000000
(1 row)
Time: 141377.100 ms

使用17个并行,获得了最好的性能, 耗时9423.257 毫秒。

postgres=# alter table t_bit2 set (parallel_degree=17);
ALTER TABLE
Time: 0.287 ms
postgres=# select count(*) from t_bit2 ;
   count    
------------
 1600000000
(1 row)

Time: 9423.257 ms

并行度为17时,每秒处理的数据量已经达到9.55GB。
与非并行相比,性能达到了15倍,基本上是线性的。
但是可能由于NUMA的原因(并行度增加时, 读数据操作可能会引入较多的__mutex_lock_slowpath, _spin_lock),并行度再加上来性能并不能再线性提升,而是会往下走。
_
另一组测试数据,加入了BIT计算。
32个并行度时,可以获得最好的性能提升,同样也和NUMA有关,为什么并行度能更高呢,因为计算量更大了,扫描冲突可以分担掉。
同样性能比达到了30.9倍,也基本上是线性的。
_

postgres=# alter table t_bit2 set (parallel_degree=32);
ALTER TABLE
Time: 0.341 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count    
------------
 1600000000
(1 row)

Time: 15836.064 ms
postgres=# alter table t_bit2 set (parallel_degree=0);
ALTER TABLE
Time: 0.368 ms
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count    
------------
 1600000000
(1 row)

Time: 488459.158 ms
postgres=# select 488459.158 /15826.358;
      ?column?       
---------------------
 30.8636489835501004
(1 row)

Time: 2.919 ms

后面会再提供tpc-h的测试数据。


那么如何设置并行度呢?决定并行度的几个参数如下
.1. 最大允许的并行度
max_parallel_degree


.2. 表设置的并行度(create table或alter table设置)
parallel_degree
如果设置了表的并行度,则最终并行度取min(max_parallel_degree , parallel_degree )

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);


.3. 如果表没有设置并行度parallel_degree ,则根据表的大小 和 parallel_threshold 这个硬编码值决定,计算得出(见函数create_plain_partial_paths)
然后依旧受到max_parallel_degree 参数的限制,不能大于它。
代码如下

src/backend/optimizer/util/plancat.c
void
get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
                                  RelOptInfo *rel)
{
...
        /* Retrive the parallel_degree reloption, if set. */
        rel->rel_parallel_degree = RelationGetParallelDegree(relation, -1);
...


src/include/utils/rel.h
/*
 * RelationGetParallelDegree
 *              Returns the relation's parallel_degree.  Note multiple eval of argument!
 */
#define RelationGetParallelDegree(relation, defaultpd) \
        ((relation)->rd_options ? \
         ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultpd))


src/backend/optimizer/path/allpaths.c
/*
 * create_plain_partial_paths
 *        Build partial access paths for parallel scan of a plain relation
 */
static void
create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
{
        int                     parallel_degree = 1;

        /*
         * If the user has set the parallel_degree reloption, we decide what to do
         * based on the value of that option.  Otherwise, we estimate a value.
         */
        if (rel->rel_parallel_degree != -1)
        {
                /*
                 * If parallel_degree = 0 is set for this relation, bail out.  The
                 * user does not want a parallel path for this relation.
                 */
                if (rel->rel_parallel_degree == 0)
                        return;

                /*
                 * Use the table parallel_degree, but don't go further than
                 * max_parallel_degree.
                 */
                parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
        }
        else
        {
                int                     parallel_threshold = 1000;

                /*
                 * If this relation is too small to be worth a parallel scan, just
                 * return without doing anything ... unless it's an inheritance child.
                 * In that case, we want to generate a parallel path here anyway.  It
                 * might not be worthwhile just for this relation, but when combined
                 * with all of its inheritance siblings it may well pay off.
                 */
                if (rel->pages < parallel_threshold &&
                        rel->reloptkind == RELOPT_BASEREL)
                        return;
// 表级并行度没有设置时,通过表的大小和parallel_threshold 计算并行度  
                /*
                 * Limit the degree of parallelism logarithmically based on the size
                 * of the relation.  This probably needs to be a good deal more
                 * sophisticated, but we need something here for now.
                 */
                while (rel->pages > parallel_threshold * 3 &&
                           parallel_degree < max_parallel_degree)
                {
                        parallel_degree++;
                        parallel_threshold *= 3;
                        if (parallel_threshold >= PG_INT32_MAX / 3)
                                break;
                }
        }

        /* Add an unordered partial path based on a parallel sequential scan. */
        add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_degree));
}


其他测试数据:

增加到32个并行,和硬件有关,并不一定是并行度最高时,性能就最好,前面已经分析了,一定要找到每个查询的拐点。  
postgres=# alter table t_bit2 set (parallel_degree =32);

postgres=# explain (analyze,verbose,timing,costs,buffers) select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
                                                                                                                                                                                                                                        QUERY
 PLAN                                                                                                                                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1551053.25..1551053.26 rows=1 width=8) (actual time=31092.551..31092.552 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=1473213
   ->  Gather  (cost=1551049.96..1551053.17 rows=32 width=8) (actual time=31060.939..31092.469 rows=33 loops=1)
         Output: (PARTIAL count(*))
         Workers Planned: 32
         Workers Launched: 32
         Buffers: shared hit=1473213
         ->  Partial Aggregate  (cost=1550049.96..1550049.97 rows=1 width=8) (actual time=31047.074..31047.075 rows=1 loops=33)
               Output: PARTIAL count(*)
               Buffers: shared hit=1470589
               Worker 0: actual time=31037.287..31037.288 rows=1 loops=1
                 Buffers: shared hit=43483
               Worker 1: actual time=31035.803..31035.804 rows=1 loops=1
                 Buffers: shared hit=45112
......
               Worker 31: actual time=31055.871..31055.876 rows=1 loops=1
                 Buffers: shared hit=46439
               ->  Parallel Seq Scan on public.t_bit2  (cost=0.00..1549983.80 rows=26465 width=0) (actual time=0.040..17244.827 rows=6060606 loops=33)
                     Output: id
                     Filter: (bitand(t_bit2.id, B'1010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101
0101010101010'::"bit") = B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010'::"bit")
                     Buffers: shared hit=1470589
                     Worker 0: actual time=0.035..17314.296 rows=5913688 loops=1
                       Buffers: shared hit=43483
                     Worker 1: actual time=0.030..16965.158 rows=6135232 loops=1
                       Buffers: shared hit=45112
......
                     Worker 31: actual time=0.031..17580.908 rows=6315704 loops=1
                       Buffers: shared hit=46439
 Planning time: 0.354 ms
 Execution time: 31157.006 ms
(145 rows)

比特位运算  
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count   
-----------
 200000000
(1 row)
Time: 4320.931 ms

COUNT  
postgres=# select count(*) from t_bit2;
   count   
-----------
 200000000
(1 row)
Time: 1896.647 ms

关闭并行的查询效率    
postgres=# set force_parallel_mode =off;
SET
postgres=# alter table t_bit2 set (parallel_degree =0);
ALTER TABLE
postgres=# \timing
Timing is on.
postgres=# select count(*) from t_bit2 where bitand(id, '10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010')=B'10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010';
   count   
-----------
 200000000
(1 row)
Time: 53098.480 ms
postgres=# select count(*) from t_bit2;
   count   
-----------
 200000000
(1 row)
Time: 18504.679 ms

表大小  
postgres=# \dt+ t_bit2
                    List of relations
 Schema |  Name  | Type  |  Owner   | Size  | Description 
--------+--------+-------+----------+-------+-------------
 public | t_bit2 | table | postgres | 11 GB | 
(1 row)


参考信息
http://www.postgresql.org/docs/9.6/static/sql-createtable.html

parallel_degree (integer)
The parallel degree for a table is the number of workers that should be used to assist a parallel scan of that table. If not set, the system will determine a value based on the relation size. The actual number of workers chosen by the planner may be less, for example due to the setting of max_parallel_degree.

http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER

force_parallel_mode (enum)
Allows the use of parallel queries for testing purposes even in cases where no performance benefit is expected. The allowed values of force_parallel_mode are off (use parallel mode only when it is expected to improve performance), on (force parallel query for all queries for which it is thought to be safe), and regress (like on, but with additional behavior changes as explained below).

More specifically, setting this value to on will add a Gather node to the top of any query plan for which this appears to be safe, so that the query runs inside of a parallel worker. Even when a parallel worker is not available or cannot be used, operations such as starting a subtransaction that would be prohibited in a parallel query context will be prohibited unless the planner believes that this will cause the query to fail. If failures or unexpected results occur when this option is set, some functions used by the query may need to be marked PARALLEL UNSAFE (or, possibly, PARALLEL RESTRICTED).

Setting this value to regress has all of the same effects as setting it to on plus some additional effects that are intended to facilitate automated regression testing. Normally, messages from a parallel worker include a context line indicating that, but a setting of regress suppresses this line so that the output is the same as in non-parallel execution. Also, the Gather nodes added to plans by this setting are hidden in EXPLAIN output so that the output matches what would be obtained if this setting were turned off.

http://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

max_parallel_degree (integer)
Sets the maximum number of workers that can be started for an individual parallel operation. Parallel workers are taken from the pool of processes established by max_worker_processes. Note that the requested number of workers may not actually be available at runtime. If this occurs, the plan will run with fewer workers than expected, which may be inefficient. The default value is 2. Setting this value to 0 disables parallel query execution.

http://www.postgresql.org/docs/9.6/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

parallel_setup_cost (floating point)
Sets the planner's estimate of the cost of launching parallel worker processes. The default is 1000.
parallel_tuple_cost (floating point)
Sets the planner's estimate of the cost of transferring one tuple from a parallel worker process to another process. The default is 0.1.
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
19天前
|
关系型数据库 分布式数据库 数据库
成都晨云信息技术完成阿里云PolarDB数据库产品生态集成认证
近日,成都晨云信息技术有限责任公司(以下简称晨云信息)与阿里云PolarDB PostgreSQL版数据库产品展开产品集成认证。测试结果表明,晨云信息旗下晨云-站群管理系统(V1.0)与阿里云以下产品:开源云原生数据库PolarDB PostgreSQL版(V11),完全满足产品兼容认证要求,兼容性良好,系统运行稳定。
|
26天前
|
关系型数据库 分布式数据库 数据库
PolarDB常见问题之数据库不能自己减少节点如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
26天前
|
缓存 关系型数据库 分布式数据库
PolarDB常见问题之数据库cpu突然飙高如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
30天前
|
数据库 开发者
参与TiDB社区,共筑开源数据库的未来
【2月更文挑战第25天】TiDB社区作为开源数据库项目的重要一环,汇聚了众多数据库爱好者与开发者。本文旨在鼓励读者积极参与TiDB社区,通过贡献代码、分享经验、参与讨论等方式,共同推动TiDB的发展。文章将介绍TiDB社区的特点、参与方式以及贡献的意义,帮助读者了解并融入这个充满活力的开源社区。
|
4天前
|
运维 关系型数据库 分布式数据库
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
4月19日周五,PolarDB开源社区联合科大讯飞共同举办开源数据库技术沙龙,本次沙龙我们邀请了众多数据库领域的专家,期待大家的参与!
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
|
26天前
|
存储 关系型数据库 分布式数据库
PolarDB常见问题之PolarDB突然有大量服务连不上数据库如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
28天前
|
NoSQL 关系型数据库 Linux
Star 1.6k!当Web遇上Linux和数据库!一站式管理平台的开源之旅!
Star 1.6k!当Web遇上Linux和数据库!一站式管理平台的开源之旅!
|
29天前
|
数据采集 JSON 小程序
GitHub 开源数据库 chinese-poetry,最全中文诗歌古典文集数据库
GitHub 开源数据库 chinese-poetry,最全中文诗歌古典文集数据库
|
30天前
|
存储 关系型数据库 MySQL
TiDB与MySQL、PostgreSQL等数据库的比较分析
【2月更文挑战第25天】本文将对TiDB、MySQL和PostgreSQL等数据库进行详细的比较分析,探讨它们各自的优势和劣势。TiDB作为一款分布式关系型数据库,在扩展性、并发性能等方面表现突出;MySQL以其易用性和成熟性受到广泛应用;PostgreSQL则在数据完整性、扩展性等方面具有优势。通过对比这些数据库的特点和适用场景,帮助企业更好地选择适合自己业务需求的数据库系统。
|
2月前
|
关系型数据库 分布式数据库 数据库
阿里云PolarDB登顶2024中国数据库流行榜:技术实力与开发者影响力
近日,阿里云旗下的自研云原生数据库PolarDB在2024年中国数据库流行度排行榜中夺冠,并刷新了榜单总分纪录,这一成就引起了技术圈的广泛关注。这一成就源于PolarDB在数据库技术上的突破与创新,以及对开发者和用户的实际需求的深入了解体会。那么本文就来分享一下关于数据库流行度排行榜的影响力以及对数据库选型的影响,讨论PolarDB登顶的关键因素,以及PolarDB“三层分离”新版本对开发者使用数据库的影响。
74 3
阿里云PolarDB登顶2024中国数据库流行榜:技术实力与开发者影响力

相关产品

  • 云原生数据库 PolarDB