PostgreSQL 9.6 聚合运算180倍性能提升如何做到? 聚合代码优化OP复用浅析

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介: PostgreSQL 9.6 内核优化之 聚合代码优化OP复用浅析 作者 digoal 日期 2016-10-08 标签 PostgreSQL , 9.6 , 内核优化 , 聚合代码优化 , OP复用 背景 聚合操作指将分组的数据聚合为一个结果输出。 聚合通常用在统

PostgreSQL 9.6 内核优化之 聚合代码优化OP复用浅析

作者

digoal

日期

2016-10-08

标签

PostgreSQL , 9.6 , 内核优化 , 聚合代码优化 , OP复用


背景

聚合操作指将分组的数据聚合为一个结果输出。

聚合通常用在统计应用中,例如统计分组的最大值,最小值,记录数,平均值,方差,截距,相关性。

聚合也可能被用于文本分析或者图像分析等,例如最佳相似度,行列变换,聚合为数组或JSON,图像堆叠等。

因此聚合通常需要启动值,行的处理,以及结果的格式转换3个过程。

PostgreSQL的聚合也包含了以上三个过程,创建一个聚合函数的语法如下:

CREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
    SFUNC = sfunc,
    STYPE = state_data_type
    [ , SSPACE = state_data_size ]
    [ , FINALFUNC = ffunc ]
    [ , FINALFUNC_EXTRA ]
    [ , COMBINEFUNC = combinefunc ]
    [ , SERIALFUNC = serialfunc ]
    [ , DESERIALFUNC = deserialfunc ]
    [ , INITCOND = initial_condition ]
    [ , MSFUNC = msfunc ]
    [ , MINVFUNC = minvfunc ]
    [ , MSTYPE = mstate_data_type ]
    [ , MSSPACE = mstate_data_size ]
    [ , MFINALFUNC = mffunc ]
    [ , MFINALFUNC_EXTRA ]
    [ , MINITCOND = minitial_condition ]
    [ , SORTOP = sort_operator ]
    [ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ]
)

例子

CREATE AGGREGATE avg (float8)
(
    sfunc = float8_accum,
    stype = float8[],
    finalfunc = float8_avg,
    initcond = '{0,0,0}'
);

参考

https://www.postgresql.org/docs/9.6/static/xaggr.html

https://www.postgresql.org/docs/9.6/static/sql-createaggregate.html

PostgreSQL 聚合处理流程如图

pic1

1. 使用initcond指定internal-state的初始值,没有则为空。    

2. 每条记录(作为next-data-values输入),调用 sfunc( internal-state, next-data-values ) ---> next-internal-state  
   输出的结果作为中间结果继续调用sfunc

3. ffunc( internal-state ) ---> aggregate-value
   可选,作为结果转换使用

9.6 聚合OP优化

pic2

如果initcond与sfunc一致,在同一个聚合分组内,sfunc只需要计算一遍所有记录,而不需要计算多遍。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=804163bc25e979fcd91b02e58fa2d1c6b587cc65

Share transition state between different aggregates when possible.

If there are two different aggregates in the query with same inputs, 
and the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functionsseparately. 
For example, AVG(x) and SUM(x) aggregates have the same transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore accumulate the state function only once, which gives a nice speedup.

David Rowley, reviewed and edited by me.

我们可以通过以下SQL查看可以共享OP的聚合函数,rank一致的都可以共享。

postgres=# select rank() over (partition by 1 order by aggtransfn,agginitval),
           row_number() over (partition by aggtransfn,agginitval order by aggfnoid) rn,
           aggfnoid,aggtransfn,agginitval from pg_aggregate ;
rank rn aggfnoid aggtransfn agginitval
1 1 pg_catalog.sum float4pl none
2 1 pg_catalog.avg float4_accum {0,0,0}
2 2 pg_catalog.variance float4_accum {0,0,0}
2 3 pg_catalog.stddev float4_accum {0,0,0}
2 4 pg_catalog.var_samp float4_accum {0,0,0}
2 5 pg_catalog.stddev_samp float4_accum {0,0,0}
2 6 pg_catalog.var_pop float4_accum {0,0,0}
2 7 pg_catalog.stddev_pop float4_accum {0,0,0}
9 1 pg_catalog.max float4larger none
10 1 pg_catalog.min float4smaller none
11 1 pg_catalog.sum float8pl none
12 1 pg_catalog.avg float8_accum {0,0,0}
12 2 pg_catalog.variance float8_accum {0,0,0}
12 3 pg_catalog.stddev float8_accum {0,0,0}
12 4 pg_catalog.var_samp float8_accum {0,0,0}
12 5 pg_catalog.stddev_samp float8_accum {0,0,0}
12 6 pg_catalog.var_pop float8_accum {0,0,0}
12 7 pg_catalog.stddev_pop float8_accum {0,0,0}
19 1 pg_catalog.max float8larger none
20 1 pg_catalog.min float8smaller none
21 1 pg_catalog.max text_larger none
22 1 pg_catalog.min text_smaller none
23 1 pg_catalog.max array_larger none
24 1 pg_catalog.min array_smaller none
25 1 pg_catalog.max int4larger none
25 2 pg_catalog.max int4larger none
27 1 pg_catalog.min int4smaller none
27 2 pg_catalog.min int4smaller none
29 1 pg_catalog.max int2larger none
30 1 pg_catalog.min int2smaller none
31 1 pg_catalog.sum cash_pl none
32 1 pg_catalog.max cashlarger none
33 1 pg_catalog.min cashsmaller none
34 1 pg_catalog.max bpchar_larger none
35 1 pg_catalog.min bpchar_smaller none
36 1 pg_catalog.max date_larger none
37 1 pg_catalog.min date_smaller none
38 1 pg_catalog.sum interval_pl none
39 1 pg_catalog.min timestamptz_smaller none
40 1 pg_catalog.max timestamptz_larger none
41 1 pg_catalog.min interval_smaller none
42 1 pg_catalog.max interval_larger none
43 1 pg_catalog.count int8inc 0
44 1 pg_catalog.max int8larger none
45 1 pg_catalog.min int8smaller none
46 1 pg_catalog.max time_larger none
47 1 pg_catalog.min time_smaller none
48 1 pg_catalog.max timetz_larger none
49 1 pg_catalog.min timetz_smaller none
50 1 pg_catalog.bit_and bitand none
51 1 pg_catalog.bit_or bitor none
52 1 pg_catalog.min numeric_smaller none
53 1 pg_catalog.max numeric_larger none
54 1 pg_catalog.variance numeric_accum none
54 2 pg_catalog.stddev numeric_accum none
54 3 pg_catalog.var_samp numeric_accum none
54 4 pg_catalog.stddev_samp numeric_accum none
54 5 pg_catalog.var_pop numeric_accum none
54 6 pg_catalog.stddev_pop numeric_accum none
60 1 pg_catalog.variance int2_accum none
60 2 pg_catalog.stddev int2_accum none
60 3 pg_catalog.var_samp int2_accum none
60 4 pg_catalog.stddev_samp int2_accum none
60 5 pg_catalog.var_pop int2_accum none
60 6 pg_catalog.stddev_pop int2_accum none
66 1 pg_catalog.variance int4_accum none
66 2 pg_catalog.stddev int4_accum none
66 3 pg_catalog.var_samp int4_accum none
66 4 pg_catalog.stddev_samp int4_accum none
66 5 pg_catalog.var_pop int4_accum none
66 6 pg_catalog.stddev_pop int4_accum none
72 1 pg_catalog.variance int8_accum none
72 2 pg_catalog.stddev int8_accum none
72 3 pg_catalog.var_samp int8_accum none
72 4 pg_catalog.stddev_samp int8_accum none
72 5 pg_catalog.var_pop int8_accum none
72 6 pg_catalog.stddev_pop int8_accum none
78 1 pg_catalog.sum int2_sum none
79 1 pg_catalog.sum int4_sum none
80 1 pg_catalog.avg interval_accum {0 second,0 second}
81 1 pg_catalog.bit_and int2and none
82 1 pg_catalog.bit_or int2or none
83 1 pg_catalog.bit_and int4and none
84 1 pg_catalog.bit_or int4or none
85 1 pg_catalog.bit_and int8and none
86 1 pg_catalog.bit_or int8or none
87 1 pg_catalog.avg int2_avg_accum {0,0}
88 1 pg_catalog.avg int4_avg_accum {0,0}
89 1 pg_catalog.max oidlarger none
90 1 pg_catalog.min oidsmaller none
91 1 pg_catalog.min timestamp_smaller none
92 1 pg_catalog.max timestamp_larger none
93 1 pg_catalog.array_agg array_agg_transfn none
94 1 bool_and booland_statefunc none
94 2 every booland_statefunc none
96 1 bool_or boolor_statefunc none
97 1 pg_catalog.avg int8_avg_accum none
97 2 pg_catalog.sum int8_avg_accum none
99 1 pg_catalog.max tidlarger none
100 1 pg_catalog.min tidsmaller none
101 1 pg_catalog.count int8inc_any 0
102 1 regr_count int8inc_float8_float8 0
103 1 regr_sxx float8_regr_accum {0,0,0,0,0,0}
103 2 regr_syy float8_regr_accum {0,0,0,0,0,0}
103 3 regr_sxy float8_regr_accum {0,0,0,0,0,0}
103 4 regr_avgx float8_regr_accum {0,0,0,0,0,0}
103 5 regr_avgy float8_regr_accum {0,0,0,0,0,0}
103 6 regr_r2 float8_regr_accum {0,0,0,0,0,0}
103 7 regr_slope float8_regr_accum {0,0,0,0,0,0}
103 8 regr_intercept float8_regr_accum {0,0,0,0,0,0}
103 9 covar_pop float8_regr_accum {0,0,0,0,0,0}
103 10 covar_samp float8_regr_accum {0,0,0,0,0,0}
103 11 corr float8_regr_accum {0,0,0,0,0,0}
114 1 pg_catalog.avg numeric_avg_accum none
114 2 pg_catalog.sum numeric_avg_accum none
116 1 xmlagg xmlconcat2 none
117 1 json_agg json_agg_transfn none
118 1 json_object_agg json_object_agg_transfn none
119 1 jsonb_agg jsonb_agg_transfn none
120 1 jsonb_object_agg jsonb_object_agg_transfn none
121 1 pg_catalog.min enum_smaller none
122 1 pg_catalog.max enum_larger none
123 1 pg_catalog.string_agg string_agg_transfn none
124 1 pg_catalog.string_agg bytea_string_agg_transfn none
125 1 pg_catalog.max network_larger none
126 1 pg_catalog.min network_smaller none
127 1 pg_catalog.percentile_disc ordered_set_transition none
127 2 pg_catalog.percentile_cont ordered_set_transition none
127 3 pg_catalog.percentile_cont ordered_set_transition none
127 4 pg_catalog.percentile_disc ordered_set_transition none
127 5 pg_catalog.percentile_cont ordered_set_transition none
127 6 pg_catalog.percentile_cont ordered_set_transition none
127 7 mode ordered_set_transition none
134 1 pg_catalog.rank ordered_set_transition_multi none
134 2 pg_catalog.percent_rank ordered_set_transition_multi none
134 3 pg_catalog.cume_dist ordered_set_transition_multi none
134 4 pg_catalog.dense_rank ordered_set_transition_multi none
138 1 pg_catalog.array_agg array_agg_array_transfn none

我接下来抽取几个数据统计相关的,验证9.6的优化效果

  103 |  1 | regr_sxx                   | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  2 | regr_syy                   | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  3 | regr_sxy                   | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  4 | regr_avgx                  | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  5 | regr_avgy                  | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  6 | regr_r2                    | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  7 | regr_slope                 | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  8 | regr_intercept             | float8_regr_accum            | {0,0,0,0,0,0}
  103 |  9 | covar_pop                  | float8_regr_accum            | {0,0,0,0,0,0}
  103 | 10 | covar_samp                 | float8_regr_accum            | {0,0,0,0,0,0}
  103 | 11 | corr                       | float8_regr_accum            | {0,0,0,0,0,0}

这几个聚合函数的用法如下

https://www.postgresql.org/docs/9.6/static/functions-aggregate.html

Function Argument Type Return Type Partial Mode Description
corr(Y, X) double precision double precision Yes correlation coefficient
covar_pop(Y, X) double precision double precision Yes population covariance
covar_samp(Y, X) double precision double precision Yes sample covariance
regr_avgx(Y, X) double precision double precision Yes average of the independent variable (sum(X)/N)
regr_avgy(Y, X) double precision double precision Yes average of the dependent variable (sum(Y)/N)
regr_intercept(Y, X) double precision double precision Yes y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_r2(Y, X) double precision double precision Yes square of the correlation coefficient
regr_slope(Y, X) double precision double precision Yes slope of the least-squares-fit linear equation determined by the (X, Y) pairs
regr_sxx(Y, X) double precision double precision Yes sum(X^2) - sum(X)^2/N ("sum of squares" of the independent variable)
regr_sxy(Y, X) double precision double precision Yes sum(X*Y) - sum(X) * sum(Y)/N ("sum of products" of independent times dependent variable)
regr_syy(Y, X) double precision double precision Yes sum(Y^2) - sum(Y)^2/N ("sum of squares" of the dependent variable)

对比测试

测试5000万条记录

postgres=# create table agg_test(x float8, y float8);
postgres=# insert into agg_test select 10000*random(), 10000*random() from generate_series(1,50000000);

1. 9.6 非并行
聚合计算耗费了7.1秒

postgres=# show max_parallel_workers_per_gather ;
 max_parallel_workers_per_gather 
---------------------------------
 0
(1 row)

postgres=# explain (analyze,verbose,timing,costs,buffers) select corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) from agg_test ;
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2145276.13..2145276.14 rows=1 width=88) (actual time=11703.472..11703.472 rows=1 loops=1)
   Output: corr(y, x), covar_pop(y, x), covar_samp(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_intercept(y, x), regr_r2(y, x), regr_slope(y, x), regr_sxx(y, x), regr_sxy(y, x), regr_syy(y, x)
   Buffers: shared hit=270271
   ->  Seq Scan on public.agg_test  (cost=0.00..770272.36 rows=50000136 width=16) (actual time=0.010..4594.588 rows=50000000 loops=1)
         Output: x, y
         Buffers: shared hit=270271
 Planning time: 0.082 ms
 Execution time: 11703.541 ms
(8 rows)

2. 9.5
聚合计算耗费了36.1秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) from agg_test ;
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2145276.13..2145276.14 rows=1 width=16) (actual time=40563.668..40563.669 rows=1 loops=1)
   Output: corr(y, x), covar_pop(y, x), covar_samp(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_intercept(y, x), regr_r2(y, x), regr_slope(y, x), regr_sxx(y, x), regr_sxy(y, x), regr_syy(y, x)
   Buffers: shared hit=270271
   ->  Seq Scan on public.agg_test  (cost=0.00..770272.36 rows=50000136 width=16) (actual time=0.009..4481.032 rows=50000000 loops=1)
         Output: x, y
         Buffers: shared hit=270271
 Planning time: 0.063 ms
 Execution time: 40563.742 ms
(8 rows)

3. 9.6 并行
聚合计算约耗费0.2秒

postgres=# set max_parallel_workers_per_gather =128;
postgres=# set force_parallel_mode =on;  
postgres=# alter table agg_test set (parallel_workers =32);
postgres=# explain (analyze,verbose,timing,costs,buffers) select corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) from agg_test ;
                                                                                                                                                      QUERY PLAN                                                 

 Finalize Aggregate  (cost=329869.02..329869.03 rows=1 width=88) (actual time=456.718..456.718 rows=1 loops=1)
   Output: corr(y, x), covar_pop(y, x), covar_samp(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_intercept(y, x), regr_r2(y, x), regr_slope(y, x), regr_sxx(y, x), regr_sxy(y, x), regr_syy(y, x)
   Buffers: shared hit=275071
   ->  Gather  (cost=329864.90..329868.11 rows=32 width=352) (actual time=456.567..456.612 rows=33 loops=1)
         Output: (PARTIAL corr(y, x)), (PARTIAL covar_pop(y, x)), (PARTIAL covar_samp(y, x)), (PARTIAL regr_avgx(y, x)), (PARTIAL regr_avgy(y, x)), (PARTIAL regr_intercept(y, x)), (PARTIAL regr_r2(y, x)), (PARTIAL regr_slope(y, x)), (PARTIAL regr_sxx(y, x)), (PARTIAL regr_sxy(y, x)), (PARTIAL regr_syy(y, x))
         Workers Planned: 32
         Workers Launched: 32
         Buffers: shared hit=275071
         ->  Partial Aggregate  (cost=328864.90..328864.91 rows=1 width=352) (actual time=451.769..451.769 rows=1 loops=33)
               Output: PARTIAL corr(y, x), PARTIAL covar_pop(y, x), PARTIAL covar_samp(y, x), PARTIAL regr_avgx(y, x), PARTIAL regr_avgy(y, x), PARTIAL regr_intercept(y, x), PARTIAL regr_r2(y, x), PARTIAL regr_slope(y, x), PARTIAL regr_sxx(y, x), PARTIAL regr_sxy(y, x), PARTIAL regr_syy(y, x)
               Buffers: shared hit=270655
               Worker 0: actual time=448.888..448.888 rows=1 loops=1
                 Buffers: shared hit=8265
               Worker 1: actual time=449.881..449.881 rows=1 loops=1
                 Buffers: shared hit=8357
               Worker 2: actual time=450.175..450.176 rows=1 loops=1
                 Buffers: shared hit=8295
               Worker 3: actual time=450.306..450.306 rows=1 loops=1
                 Buffers: shared hit=8357
               Worker 4: actual time=449.567..449.567 rows=1 loops=1
                 Buffers: shared hit=6844
               Worker 5: actual time=450.467..450.467 rows=1 loops=1
                 Buffers: shared hit=8360
               Worker 6: actual time=450.574..450.574 rows=1 loops=1
                 Buffers: shared hit=7898
               Worker 7: actual time=450.665..450.665 rows=1 loops=1
                 Buffers: shared hit=8397
               Worker 8: actual time=450.719..450.719 rows=1 loops=1
                 Buffers: shared hit=8084
               Worker 9: actual time=450.922..450.922 rows=1 loops=1
                 Buffers: shared hit=8405
               Worker 10: actual time=451.004..451.004 rows=1 loops=1
                 Buffers: shared hit=5421
               Worker 11: actual time=451.175..451.175 rows=1 loops=1
                 Buffers: shared hit=8431
               Worker 12: actual time=451.316..451.316 rows=1 loops=1
                 Buffers: shared hit=8276
               Worker 13: actual time=451.457..451.457 rows=1 loops=1
                 Buffers: shared hit=8431
               Worker 14: actual time=451.506..451.506 rows=1 loops=1
                 Buffers: shared hit=8163
               Worker 15: actual time=451.670..451.670 rows=1 loops=1
                 Buffers: shared hit=7959
               Worker 16: actual time=451.797..451.797 rows=1 loops=1
                 Buffers: shared hit=8428
               Worker 17: actual time=451.875..451.875 rows=1 loops=1
                 Buffers: shared hit=8265
               Worker 18: actual time=451.982..451.982 rows=1 loops=1
                 Buffers: shared hit=8444
               Worker 19: actual time=452.127..452.127 rows=1 loops=1
                 Buffers: shared hit=7717
               Worker 20: actual time=452.232..452.232 rows=1 loops=1
                 Buffers: shared hit=8450
               Worker 21: actual time=452.331..452.331 rows=1 loops=1
                 Buffers: shared hit=8304
               Worker 22: actual time=452.450..452.450 rows=1 loops=1
                 Buffers: shared hit=8455
               Worker 23: actual time=452.592..452.592 rows=1 loops=1
                 Buffers: shared hit=8367
               Worker 24: actual time=452.679..452.679 rows=1 loops=1
                 Buffers: shared hit=8460
               Worker 25: actual time=452.814..452.815 rows=1 loops=1
                 Buffers: shared hit=8445
               Worker 26: actual time=452.969..452.969 rows=1 loops=1
                 Buffers: shared hit=8465
               Worker 27: actual time=452.999..452.999 rows=1 loops=1
                 Buffers: shared hit=8454
               Worker 28: actual time=453.193..453.193 rows=1 loops=1
                 Buffers: shared hit=8462
               Worker 29: actual time=452.985..452.985 rows=1 loops=1
                 Buffers: shared hit=8437
               Worker 30: actual time=453.482..453.483 rows=1 loops=1
                 Buffers: shared hit=8348
               Worker 31: actual time=453.505..453.505 rows=1 loops=1
                 Buffers: shared hit=8182
               ->  Parallel Seq Scan on public.agg_test  (cost=0.00..285896.04 rows=1562504 width=16) (actual time=0.046..248.331 rows=1515152 loops=33)
                     Output: y, x
                     Buffers: shared hit=270655
                     Worker 0: actual time=0.058..247.983 rows=1526805 loops=1
                       Buffers: shared hit=8265
                     Worker 1: actual time=0.047..249.121 rows=1543825 loops=1
                       Buffers: shared hit=8357
                     Worker 2: actual time=0.047..249.206 rows=1532355 loops=1
                       Buffers: shared hit=8295
                     Worker 3: actual time=0.047..249.914 rows=1543825 loops=1
                       Buffers: shared hit=8357
                     Worker 4: actual time=0.069..244.072 rows=1263920 loops=1
                       Buffers: shared hit=6844
                     Worker 5: actual time=0.046..250.046 rows=1544380 loops=1
                       Buffers: shared hit=8360
                     Worker 6: actual time=0.047..247.860 rows=1458910 loops=1
                       Buffers: shared hit=7898
                     Worker 7: actual time=0.045..249.471 rows=1551225 loops=1
                       Buffers: shared hit=8397
                     Worker 8: actual time=0.047..247.850 rows=1493320 loops=1
                       Buffers: shared hit=8084
                     Worker 9: actual time=0.049..249.905 rows=1552705 loops=1
                       Buffers: shared hit=8405
                     Worker 10: actual time=0.048..240.578 rows=1000665 loops=1
                       Buffers: shared hit=5421
                     Worker 11: actual time=0.043..249.234 rows=1557515 loops=1
                       Buffers: shared hit=8431
                     Worker 12: actual time=0.044..248.830 rows=1528840 loops=1
                       Buffers: shared hit=8276
                     Worker 13: actual time=0.046..249.576 rows=1557515 loops=1
                       Buffers: shared hit=8431
                     Worker 14: actual time=0.043..248.819 rows=1507935 loops=1
                       Buffers: shared hit=8163
                     Worker 15: actual time=0.046..248.303 rows=1470195 loops=1
                       Buffers: shared hit=7959
                     Worker 16: actual time=0.045..249.997 rows=1556960 loops=1
                       Buffers: shared hit=8428
                     Worker 17: actual time=0.046..249.282 rows=1526805 loops=1
                       Buffers: shared hit=8265
                     Worker 18: actual time=0.043..249.849 rows=1559785 loops=1
                       Buffers: shared hit=8444
                     Worker 19: actual time=0.047..247.241 rows=1425425 loops=1
                       Buffers: shared hit=7717
                     Worker 20: actual time=0.043..250.134 rows=1561030 loops=1
                       Buffers: shared hit=8450
                     Worker 21: actual time=0.044..249.316 rows=1534020 loops=1
                       Buffers: shared hit=8304
                     Worker 22: actual time=0.043..250.169 rows=1561955 loops=1
                       Buffers: shared hit=8455
                     Worker 23: actual time=0.045..249.550 rows=1545675 loops=1
                       Buffers: shared hit=8367
                     Worker 24: actual time=0.044..250.062 rows=1562880 loops=1
                       Buffers: shared hit=8460
                     Worker 25: actual time=0.043..250.298 rows=1560105 loops=1
                       Buffers: shared hit=8445
                     Worker 26: actual time=0.043..249.939 rows=1563805 loops=1
                       Buffers: shared hit=8465
                     Worker 27: actual time=0.049..250.511 rows=1561770 loops=1
                       Buffers: shared hit=8454
                     Worker 28: actual time=0.045..250.523 rows=1563250 loops=1
                       Buffers: shared hit=8462
                     Worker 29: actual time=0.049..250.492 rows=1558625 loops=1
                       Buffers: shared hit=8437
                     Worker 30: actual time=0.053..247.131 rows=1542160 loops=1
                       Buffers: shared hit=8348
                     Worker 31: actual time=0.053..249.789 rows=1511450 loops=1
                       Buffers: shared hit=8182
 Planning time: 0.101 ms
 Execution time: 483.888 ms
(144 rows)

9.6的优化效果很明显,在没有使用并行的情况下,聚合操作已经有约5倍的性能提升。

结果对比

版本 9.6 9.5 9.6并行(32)
5000万记录(11个聚合函数)耗时(秒) 7.1 36.1 0.2

pic3

代码

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=804163bc25e979fcd91b02e58fa2d1c6b587cc65

涉及如下

src/backend/executor/execQual.c     diff | blob | blame | history
src/backend/executor/nodeAgg.c      diff | blob | blame | history
src/backend/executor/nodeWindowAgg.c        diff | blob | blame | history
src/backend/parser/parse_agg.c      diff | blob | blame | history
src/include/nodes/execnodes.h       diff | blob | blame | history
src/include/parser/parse_agg.h      diff | blob | blame | history
src/test/regress/expected/aggregates.out        diff | blob | blame | history
src/test/regress/sql/aggregates.sql     diff | blob | blame | history

小结

在统计学中,大多数的统计算法的中间结果都是可以共用的,例如sum,avg; 方差,相关性,count,sum等运算;

PostgreSQL 9.6很好的抓住了这样的特征,对初始条件一致,中间算法一致的聚合函数,在同一个分组中数据只需要计算一遍,大大降低了CPU的开销,提高了统计效率。

这个思路与LLVM有一些神似的地方,不过LLVM的适用场景更广。

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
9月前
|
关系型数据库 大数据 PostgreSQL
PostgreSQL16-新特性-并行聚合
PostgreSQL16-新特性-并行聚合
102 0
|
关系型数据库 PostgreSQL
PostgreSQL listagg within group (order by) 聚合兼容用法 string_agg ( order by) - 行列变换,CSV构造...
标签 PostgreSQL , order-set agg , listagg , string_agg , order 背景 listagg — Rows to Delimited Strings The listagg function transforms values from a g...
5798 0
|
5月前
|
存储 人工智能 关系型数据库
5倍性能提升,阿里云AnalyticDB PostgreSQL版新一代实时智能引擎重磅发布
2023 云栖大会上,AnalyticDB for PostgreSQL新一代实时智能引擎重磅发布,全自研计算和行列混存引擎较比开源Greenplum有5倍以上性能提升。AnalyticDB for PostgreSQL与通义大模型家族深度集成,推出一站式AIGC解决方案。阿里云新发布的行业模型及“百炼”平台,采用AnalyticDB for PostgreSQL作为内置向量检索引擎,性能较开源增强了2~5倍。大会上来自厦门国际银行、三七互娱等知名企业代表和瑶池数据库团队产品及技术资深专家们结合真实场景实践,深入分享了最新的技术进展和解析。
5倍性能提升,阿里云AnalyticDB PostgreSQL版新一代实时智能引擎重磅发布
|
存储 SQL JSON
大分区表高并发性能提升100倍?阿里云 RDS PostgreSQL 12 特性解读
世界上几乎最强大的开源数据库系统 PostgreSQL,于 2019 年 10 月 3 日发布了 12 版本,该版本已经在阿里云正式发布。PostgreSQL 12 在功能和性能上都有很大提升,如大分区表高并发性能提升百倍,B-tree 索引空间和性能优化,实现 SQL 2016 标准的 JSON 特性,支持多列 MCV(Most-Common-Value)统计,内联 CTE(Common table expressions)以及可插拔的表存储访问接口等。本文对部分特性进行解读。
1954 0
大分区表高并发性能提升100倍?阿里云 RDS PostgreSQL 12 特性解读
|
关系型数据库 PostgreSQL
【重新发现PostgreSQL之美】- 48 聚合、窗口过滤器
大家好,这里是重新发现PostgreSQL之美 - 48 聚合、窗口过滤器
|
弹性计算 关系型数据库 PostgreSQL
PostgreSQL PostGIS 性能提升 - by new GEOS代码
标签 PostgreSQL , PostGIS , geos 背景 http://lin-ear-th-inking.blogspot.com/2019/02/betterfaster-stpointonsurface-for.html 使用GEOS新的代码,提升PostGIS重计算的函数性能。 The improved ST_PointOnSurface runs 13 times
718 0
|
SQL 分布式计算 并行计算
PostgreSQL 并行计算解说 之9 - parallel 自定义并行聚合
标签 PostgreSQL , cpu 并行 , smp 并行 , 并行计算 , gpu 并行 , 并行过程支持 背景 PostgreSQL 11 优化器已经支持了非常多场合的并行。简单估计,已支持27余种场景的并行计算。 parallel seq scan parallel index scan
552 0
|
SQL 关系型数据库 PostgreSQL
PostgreSQL 空间聚合性能 - 行政区、电子围栏 空间聚合 - 时间、空间热力图
标签 PostgreSQL , 空间聚合 , 空间热力图 , 行政区 , 电子围栏 背景 某个时间区间(或其他条件),出现在某些围栏、行政区(多边形信息)中的对象(空间点信息)有多少,按颜色深浅渲染这些多边形。
2394 0
|
物联网 关系型数据库 流计算
PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合
标签 PostgreSQL , IoT , 轨迹聚合 , pipelinedb , 流计算 , 实时聚合 背景 IoT场景,车联网场景,共享单车场景,人的行为位点等,终端实时上报的是孤立的位点,我们需要将其补齐成轨迹。
1776 0
|
21天前
|
关系型数据库 分布式数据库 数据库
成都晨云信息技术完成阿里云PolarDB数据库产品生态集成认证
近日,成都晨云信息技术有限责任公司(以下简称晨云信息)与阿里云PolarDB PostgreSQL版数据库产品展开产品集成认证。测试结果表明,晨云信息旗下晨云-站群管理系统(V1.0)与阿里云以下产品:开源云原生数据库PolarDB PostgreSQL版(V11),完全满足产品兼容认证要求,兼容性良好,系统运行稳定。

相关产品

  • 云原生数据库 PolarDB