(TPC-H测试 SF=10,SF=200) PostgreSQL 11 vs 10 vs Deepgreen

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介: 标签PostgreSQL , TPC-H , HTAP背景去年测的PostgreSQL 10 vs Deepgreen.《TPC-H测试 - PostgreSQL 10 vs Deepgreen(Greenplum)》PG每年发一个大版本,此次PostgreSQL 11即将release,对PostgreSQL 11 beta3 vs Deepgreen v18 测试的TPC-H结果如下。

标签

PostgreSQL , TPC-H , HTAP


背景

去年测的PostgreSQL 10 vs Deepgreen.

《TPC-H测试 - PostgreSQL 10 vs Deepgreen(Greenplum)》

PG每年发一个大版本,此次PostgreSQL 11即将release,对PostgreSQL 11 beta3 vs Deepgreen v18 测试的TPC-H结果如下。

事实证明PG的AP能力又有了质的飞越,OLAP已无限接近MPP数据库deepgreen的性能。同时拥有单实例百万级QPS的OLTP能力,应该是目前覆盖业务场景可以做到最广泛的数据库(PG的TAG:全栈数据库 无疑)。

测试 环境

ECS , 32核 , 本地SSD 8TB.

PostgreSQL 11 beta3

Deepgreen v18 (16 segments)

SF=10

deepgreen部署quick guide

http://vitessedata.com/products/deepgreen-db/download/

以os用户digoal为例, 部署deepgreen(6个segment)

# 部署目录
mkdir -p /data01/dp/dbfast0
mkdir -p /data01/dp/dbfast1
mkdir -p /data01/dp/dbfast2
mkdir -p /data02/dp/dbfast1
mkdir -p /data02/dp/dbfast2
mkdir -p /data03/dp/dbfast1
mkdir -p /data03/dp/dbfast2
chown -R digoal:digoal /data01/dp
chown -R digoal:digoal /data02/dp
chown -R digoal:digoal /data03/dp
su - digoal

wget  https://s3.amazonaws.com/vitessedata/download/deepgreendb.18.08.rh7.x86_64.180823.bin   

chmod  +x deepgreendb.18.08.rh7.x86_64.180823.bin

./deepgreendb.18.08.rh7.x86_64.180823.bin

vi ./deepgreendb/greenplum_path.sh

#追加
# 使用实际目录
export MASTER_DATA_DIRECTORY=/data01/dp/dbfast0/dg-1
export PGDATA=$MASTER_DATA_DIRECTORY
export PGHOST=127.0.0.1
export PGPORT=15432
# 使用os用户名
export PGUSER=digoal
export PGPASSWORD=123
export PGDATABASE=postgres
  
. ./deepgreendb/greenplum_path.sh

echo "`hostname`" > hostfile

gpssh-exkeys -h localhost

gpssh-exkeys -h `hostname`
  

cat > cluster.conf <<HEREHERE

ARRAY_NAME="mpp1 cluster"
CLUSTER_NAME="mpp1 cluster"
MACHINE_LIST_FILE=hostfile
SEG_PREFIX=dg
DATABASE_PREFIX=dg
PORT_BASE=25432
declare -a DATA_DIRECTORY=(/data01/dp/dbfast1 /data01/dp/dbfast2 /data02/dp/dbfast1 /data02/dp/dbfast2 /data03/dp/dbfast1 /data03/dp/dbfast2)
MASTER_HOSTNAME=`hostname`
MASTER_DIRECTORY=/data01/dp/dbfast0
MASTER_PORT=15432
IP_ALLOW=0.0.0.0/0
TRUSTED_SHELL=/usr/bin/ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
export MASTER_DATA_DIRECTORY
export TRUSTED_SHELL
DEFAULT_QD_MAX_CONNECT=25
QE_CONNECT_FACTOR=5

HEREHERE



gpinitsystem -c cluster.conf -h hostfile

测试过程

PG 11(未使用分区表), DP v18(使用分区表,1个月1分区).

1、对所有列加索引

《PostgreSQL 快速给指定表每个字段创建索引》

do language plpgsql $$    
declare    
  tables name[] := array['customer','lineitem','nation','orders','part','partsupp','region','supplier'];     
  n name;     
  x name;     
  i int;    
  sql text;    
  ts1 timestamp;    
  ts2 timestamp;    
begin    
  set max_parallel_maintenance_workers=24;   -- 创建索引时的并行度    
      
  set max_parallel_workers=32;    
  set max_parallel_workers_per_gather=32;    
  set maintenance_work_mem='4GB';    
    
  set min_parallel_table_scan_size=0;    
  set min_parallel_index_scan_size=0;    
  set parallel_setup_cost=0;    
  set parallel_tuple_cost=0;    
    
  foreach n in array tables loop    
    i := 1;      
    for x in select attname from pg_attribute where attrelid=n::regclass and attnum>=1 and not attisdropped loop    
      sql := format('create index IF NOT EXISTS idx_%s__%s on %s (%s) tablespace tbs1', n, i, n, x);   -- 封装创建索引的SQL    
      ts1 := clock_timestamp();    
      raise notice '% start execute: %', ts1, sql;    
      execute sql;  -- 创建索引      
      commit;  -- pg 11已支持procedure, do里面开子事务  
      ts2 := clock_timestamp();    
      raise notice '% end execute: % , duration: %', ts2, sql, ts2-ts1;    
      i:=i+1;    
    end loop;    
    execute format('analyze %s', n);     
  end loop;    
end;    
$$;   

多轮测试后得到实际使用的索引(包括pk, uk在内), 使用下面的CODE替换之前的所有索引。

select string_agg(pg_get_indexdef(indexrelname::regclass),';') from pg_stat_all_indexes  where relname = any(array['customer','lineitem','nation','orders','part','partsupp','region','supplier']) and idx_scan<>0;
do language plpgsql $$
declare
  tables name[] := array['customer','lineitem','nation','orders','part','partsupp','region','supplier'];
  n name;
  idx_sql text;
  ts1 timestamp;
  ts2 timestamp;
  sql text;
begin
  set max_parallel_maintenance_workers=24;   -- 创建索引时的并行度

  set max_parallel_workers=32;
  set max_parallel_workers_per_gather=32;
  set maintenance_work_mem='4GB';

  set min_parallel_table_scan_size=0;
  set min_parallel_index_scan_size=0;
  set parallel_setup_cost=0;
  set parallel_tuple_cost=0;

  -- set default_tablespace='tbs1';  -- 是否使用其他表空间

  idx_sql := 'CREATE UNIQUE INDEX IF NOT EXISTS customer_pkey ON public.customer USING btree (c_custkey);CREATE INDEX IF NOT EXISTS idx1 ON public.lineitem USING btree (l_orderkey) WHERE (l_commitdate < l_receiptdate);CREATE INDEX IF NOT EXISTS idx_lineitem__11 ON public.lineitem USING btree (l_shipdate);CREATE INDEX IF NOT EXISTS idx_lineitem__2 ON public.lineitem USING btree (l_partkey);CREATE INDEX IF NOT EXISTS idx_lineitem__3 ON public.lineitem USING btree (l_suppkey);CREATE INDEX IF NOT EXISTS idx_lineitem_orderkey ON public.lineitem USING btree (l_orderkey);CREATE INDEX IF NOT EXISTS idx_lineitem_shipdate ON public.lineitem USING btree (l_shipdate, l_discount, l_quantity);CREATE UNIQUE INDEX IF NOT EXISTS lineitem_pkey ON public.lineitem USING btree (l_orderkey, l_linenumber);CREATE INDEX IF NOT EXISTS idx6 ON public.nation USING btree (n_nationkey, n_regionkey);CREATE INDEX IF NOT EXISTS idx_nation_regionkey ON public.nation USING btree (n_regionkey);CREATE UNIQUE INDEX IF NOT EXISTS nation_pkey ON public.nation USING btree (n_nationkey);CREATE INDEX IF NOT EXISTS idx_orders__6 ON public.orders USING btree (o_orderpriority);CREATE INDEX IF NOT EXISTS idx_orders_custkey ON public.orders USING btree (o_custkey);CREATE INDEX IF NOT EXISTS idx_orders_orderdate ON public.orders USING btree (o_orderdate);CREATE UNIQUE INDEX IF NOT EXISTS orders_pkey ON public.orders USING btree (o_orderkey);CREATE INDEX IF NOT EXISTS idx_part_1 ON public.part USING btree (p_container, p_brand);CREATE INDEX IF NOT EXISTS idx_part__5 ON public.part USING btree (p_type);CREATE INDEX IF NOT EXISTS idx_part__6 ON public.part USING btree (p_size);CREATE UNIQUE INDEX IF NOT EXISTS part_pkey ON public.part USING btree (p_partkey);CREATE INDEX IF NOT EXISTS idx ON public.partsupp USING btree (ps_partkey, ps_suppkey, ps_supplycost);CREATE INDEX IF NOT EXISTS idx_partsupp__4 ON public.partsupp USING btree (ps_supplycost);CREATE INDEX IF NOT EXISTS idx_partsupp_partkey ON public.partsupp USING btree (ps_partkey);CREATE INDEX IF NOT EXISTS idx_partsupp_suppkey ON public.partsupp USING btree (ps_suppkey);CREATE UNIQUE INDEX IF NOT EXISTS partsupp_pkey ON public.partsupp USING btree (ps_partkey, ps_suppkey);CREATE INDEX IF NOT EXISTS idx5 ON public.region USING btree (r_name, r_regionkey);CREATE UNIQUE INDEX IF NOT EXISTS region_pkey ON public.region USING btree (r_regionkey);CREATE INDEX IF NOT EXISTS idx4 ON public.supplier USING btree (s_suppkey, s_nationkey);CREATE INDEX IF NOT EXISTS idx_supplier_nation_key ON public.supplier USING btree (s_nationkey);CREATE UNIQUE INDEX IF NOT EXISTS supplier_pkey ON public.supplier USING btree (s_suppkey);';

  for sql in select regexp_split_to_table(idx_sql, ';') 
  loop
    ts1 := clock_timestamp();
    raise notice '% start execute: %', ts1, sql;
    execute sql; 
    commit;  -- pg 11已支持procedure, do里面开子事务
    ts2 := clock_timestamp();
    raise notice '% end execute: % , duration: %', ts2, sql, ts2-ts1;
  end loop;
  
  foreach n in array tables loop
    execute format('analyze %s', n);
  end loop;
end;
$$;

2、强制测试过程中,使用并行查询

alter role postgres set enable_nestloop=off;  
alter role postgres set enable_mergejoin=off;  
alter role postgres set work_mem ='8GB';  
  
alter role postgres set max_parallel_workers_per_gather=24;  
alter role postgres set parallel_setup_cost=0;  
alter role postgres set parallel_tuple_cost=0;  
alter role postgres set min_parallel_table_scan_size=0;  
alter role postgres set min_parallel_index_scan_size=0;  
alter role postgres set geqo_threshold=100;
  
alter table customer set (parallel_workers=24);  
alter table lineitem set (parallel_workers=24);  
alter table nation set (parallel_workers=24);  
alter table orders set (parallel_workers=24);  
alter table part set (parallel_workers=24);  
alter table partsupp set (parallel_workers=24);  
alter table region set (parallel_workers=24);  
alter table supplier set (parallel_workers=24);  

3、Q17,加了一个索引

create index idx_part_1 on part (p_container,p_brand);  

4、PG 11校对了成本因子

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

postgresql.auto.conf

cpu_tuple_cost=0.00018884145574257426  
cpu_index_tuple_cost = 0.00433497085216479990  
cpu_operator_cost = 0.00216748542608239995  
seq_page_cost=0.014329  
random_page_cost =0.016
postgresql.conf

listen_addresses = '0.0.0.0'
port = 1921
max_connections = 2000
superuser_reserved_connections = 3
unix_socket_directories = '., /var/run/postgresql, /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 48GB
max_prepared_transactions = 2000
work_mem = 4GB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 4
max_worker_processes = 128
max_parallel_maintenance_workers = 16
max_parallel_workers_per_gather = 24
parallel_leader_participation = on
max_parallel_workers = 32
wal_level = minimal  
synchronous_commit = off
wal_writer_delay = 10ms
checkpoint_timeout = 35min
max_wal_size = 96GB
min_wal_size = 24GB
checkpoint_completion_target = 0.1
max_wal_senders = 0
random_page_cost = 1.1
effective_cache_size = 400GB
default_statistics_target = 1000
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_checkpoints = on 
log_connections = on
log_disconnections = on
log_error_verbosity = verbose 
log_line_prefix = '%m [%p] '
log_timezone = 'PRC'
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 1200000000
autovacuum_multixact_freeze_max_age = 1400000000
autovacuum_vacuum_cost_delay = 0ms
vacuum_freeze_table_age = 1150000000
vacuum_multixact_freeze_table_age = 1150000000
datestyle = 'iso, mdy'
timezone = 'PRC'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'

5、deepgreen v18

http://vitessedata.com/products/deepgreen-db/download/install/

gp_vmem_protect_limit=16384;  

TPC-H (SF=10) 性能对比

pic

数据库 q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 q21 q22
PostgreSQL 10 17 2 41 6 14 4 27 31 54 16 7 100 213 4 25 60 552 1317 11 82 95 143
PostgreSQL 11beta3 3 2 2 5 3 4 2 2 9 3 1 2 4 2 6 10 27 46 2 10 13 2
DeepGreen v16 6 4 10 4 7 1 27 10 283 5 2 5 14 1 2 5 31 53 6 10 246 7
DeepGreen v18 2 1 1 1 1 0 3 1 3 2 1 2 2 0 0 1 5 3 2 1 3 1

SF=200

SF=200 Deepgreen on ECS(64c, 512G, 8*2TB(ssd) lvm strip)(33 segments)

2018-08-25 15:03:59 [1535180639] : running TPC-H benchmark
2018-08-25 15:03:59 [1535180639] : running queries defined in TPC-H benchmark
2018-08-25 15:03:59 [1535180639] :   running query 1
2018-08-25 15:03:59 [1535180639] : run explain
2018-08-25 15:03:59 [1535180639] : run the query on background
2018-08-25 15:04:12 [1535180652] :     query 1 finished OK (12 seconds)
2018-08-25 15:04:12 [1535180652] :   running query 2
2018-08-25 15:04:12 [1535180652] : run explain
2018-08-25 15:04:12 [1535180652] : run the query on background
2018-08-25 15:04:15 [1535180655] :     query 2 finished OK (3 seconds)
2018-08-25 15:04:15 [1535180655] :   running query 3
2018-08-25 15:04:15 [1535180655] : run explain
2018-08-25 15:04:15 [1535180655] : run the query on background
2018-08-25 15:04:27 [1535180667] :     query 3 finished OK (11 seconds)
2018-08-25 15:04:27 [1535180667] :   running query 4
2018-08-25 15:04:27 [1535180667] : run explain
2018-08-25 15:04:27 [1535180667] : run the query on background
2018-08-25 15:04:34 [1535180674] :     query 4 finished OK (7 seconds)
2018-08-25 15:04:34 [1535180674] :   running query 5
2018-08-25 15:04:34 [1535180674] : run explain
2018-08-25 15:04:34 [1535180674] : run the query on background
2018-08-25 15:04:47 [1535180687] :     query 5 finished OK (12 seconds)
2018-08-25 15:04:47 [1535180687] :   running query 6
2018-08-25 15:04:47 [1535180687] : run explain
2018-08-25 15:04:47 [1535180687] : run the query on background
2018-08-25 15:04:48 [1535180688] :     query 6 finished OK (1 seconds)
2018-08-25 15:04:48 [1535180688] :   running query 7
2018-08-25 15:04:48 [1535180688] : run explain
2018-08-25 15:04:48 [1535180688] : run the query on background
2018-08-25 15:05:21 [1535180721] :     query 7 finished OK (32 seconds)
2018-08-25 15:05:21 [1535180721] :   running query 8
2018-08-25 15:05:21 [1535180721] : run explain
2018-08-25 15:05:21 [1535180721] : run the query on background
2018-08-25 15:05:32 [1535180732] :     query 8 finished OK (11 seconds)
2018-08-25 15:05:32 [1535180732] :   running query 9
2018-08-25 15:05:32 [1535180732] : run explain
2018-08-25 15:05:32 [1535180732] : run the query on background
2018-08-25 15:06:02 [1535180762] :     query 9 finished OK (29 seconds)
2018-08-25 15:06:02 [1535180762] :   running query 10
2018-08-25 15:06:02 [1535180762] : run explain
2018-08-25 15:06:02 [1535180762] : run the query on background
2018-08-25 15:06:11 [1535180771] :     query 10 finished OK (9 seconds)
2018-08-25 15:06:11 [1535180771] :   running query 11
2018-08-25 15:06:11 [1535180771] : run explain
2018-08-25 15:06:11 [1535180771] : run the query on background
2018-08-25 15:06:14 [1535180774] :     query 11 finished OK (2 seconds)
2018-08-25 15:06:14 [1535180774] :   running query 12
2018-08-25 15:06:14 [1535180774] : run explain
2018-08-25 15:06:14 [1535180774] : run the query on background
2018-08-25 15:06:23 [1535180783] :     query 12 finished OK (9 seconds)
2018-08-25 15:06:23 [1535180783] :   running query 13
2018-08-25 15:06:23 [1535180783] : run explain
2018-08-25 15:06:23 [1535180783] : run the query on background
2018-08-25 15:06:41 [1535180801] :     query 13 finished OK (17 seconds)
2018-08-25 15:06:41 [1535180801] :   running query 14
2018-08-25 15:06:41 [1535180801] : run explain
2018-08-25 15:06:41 [1535180801] : run the query on background
2018-08-25 15:06:42 [1535180802] :     query 14 finished OK (1 seconds)
2018-08-25 15:06:42 [1535180802] :   running query 15
2018-08-25 15:06:42 [1535180802] : run explain
2018-08-25 15:06:42 [1535180802] : run the query on background
2018-08-25 15:06:46 [1535180806] :     query 15 finished OK (4 seconds)
2018-08-25 15:06:46 [1535180806] :   running query 16
2018-08-25 15:06:46 [1535180806] : run explain
2018-08-25 15:06:46 [1535180806] : run the query on background
2018-08-25 15:06:56 [1535180816] :     query 16 finished OK (9 seconds)
2018-08-25 15:06:56 [1535180816] :   running query 17
2018-08-25 15:06:56 [1535180816] : run explain
2018-08-25 15:06:56 [1535180816] : run the query on background
2018-08-25 15:08:06 [1535180886] :     query 17 finished OK (69 seconds)
2018-08-25 15:08:06 [1535180886] :   running query 18
2018-08-25 15:08:06 [1535180886] : run explain
2018-08-25 15:08:06 [1535180886] : run the query on background
2018-08-25 15:08:30 [1535180910] :     query 18 finished OK (24 seconds)
2018-08-25 15:08:30 [1535180910] :   running query 19
2018-08-25 15:08:30 [1535180910] : run explain
2018-08-25 15:08:31 [1535180911] : run the query on background
2018-08-25 15:08:42 [1535180922] :     query 19 finished OK (11 seconds)
2018-08-25 15:08:42 [1535180922] :   running query 20
2018-08-25 15:08:42 [1535180922] : run explain
2018-08-25 15:08:42 [1535180922] : run the query on background
2018-08-25 15:08:55 [1535180936] :     query 20 finished OK (13 seconds)
2018-08-25 15:08:56 [1535180936] :   running query 21
2018-08-25 15:08:56 [1535180936] : run explain
2018-08-25 15:08:56 [1535180936] : run the query on background
2018-08-25 15:09:16 [1535180956] :     query 21 finished OK (20 seconds)
2018-08-25 15:09:16 [1535180956] :   running query 22
2018-08-25 15:09:16 [1535180956] : run explain
2018-08-25 15:09:16 [1535180956] : run the query on background
2018-08-25 15:09:26 [1535180966] :     query 22 finished OK (9 seconds)
2018-08-25 15:09:26 [1535180966] : finished TPC-H benchmark

SF=200 PG 11

使用HINT改几条慢SQL。

gp_tpch/dss/queries/2.sql

17.sql

18.sql

《PostgreSQL 开启with-llvm(JIT)后,新增插件异常(clang: Command not found)处理》

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

Q2

/*+ Leading((partsupp ((nation region) supplier))) Leading((part (((nation region) supplier) partsupp))) */         
select
        s_acctbal,
        s_name,
        n_name,
        p_partkey,
        p_mfgr,
        s_address,
        s_phone,
        s_comment
from
        part,
        supplier,
        partsupp,
        nation,
        region
where
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 47
        and p_type like '%NICKEL'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and ps_supplycost = (
                select
                        min(ps_supplycost)
                from
                        partsupp,
                        supplier,
                        nation,
                        region
                where
                        p_partkey = ps_partkey
                        and s_suppkey = ps_suppkey
                        and s_nationkey = n_nationkey
                        and n_regionkey = r_regionkey
                        and r_name = 'ASIA'
        )
order by
        s_acctbal desc,
        n_name,
        s_name,
        p_partkey
LIMIT 100;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4589445377.60..4589445377.60 rows=1 width=198)
   ->  Sort  (cost=4589445377.60..4589445377.60 rows=1 width=198)
         Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
         ->  Hash Join  (cost=225752.90..4589445377.59 rows=1 width=198)
               Hash Cond: ((part.p_partkey = partsupp.ps_partkey) AND ((SubPlan 1) = partsupp.ps_supplycost))
               ->  Gather  (cost=0.00..18775.09 rows=158816 width=34)
                     Workers Planned: 28
                     ->  Parallel Seq Scan on part  (cost=0.00..18775.09 rows=5672 width=34)
                           Filter: (((p_type)::text ~~ '%NICKEL'::text) AND (p_size = 47))
               ->  Hash  (cost=69355.84..69355.84 rows=31998515 width=180)
                     ->  Gather  (cost=69220.57..69355.84 rows=31998515 width=180)
                           Workers Planned: 28
                           ->  Parallel Hash Join  (cost=69220.57..69355.84 rows=1142804 width=180)
                                 Hash Cond: (supplier.s_suppkey = partsupp.ps_suppkey)
                                 ->  Parallel Hash Join  (cost=858.16..861.87 rows=14286 width=172)
                                       Hash Cond: (nation.n_nationkey = supplier.s_nationkey)
                                       ->  Parallel Hash Join  (cost=0.02..0.03 rows=1 width=34)
                                             Hash Cond: (nation.n_regionkey = region.r_regionkey)
                                             ->  Parallel Seq Scan on nation  (cost=0.00..0.01 rows=1 width=42)
                                             ->  Parallel Hash  (cost=0.01..0.01 rows=1 width=8)
                                                   ->  Parallel Seq Scan on region  (cost=0.00..0.01 rows=1 width=8)
                                                         Filter: (r_name = 'ASIA'::bpchar)
                                       ->  Parallel Hash  (cost=689.83..689.83 rows=71429 width=154)
                                             ->  Parallel Seq Scan on supplier  (cost=0.00..689.83 rows=71429 width=154)
                                 ->  Parallel Hash  (cost=54898.31..54898.31 rows=5714021 width=24)
                                       ->  Parallel Seq Scan on partsupp  (cost=0.00..54898.31 rows=5714021 width=24)
               SubPlan 1
                 ->  Aggregate  (cost=1806.01..1806.01 rows=1 width=8)
                       ->  Hash Join  (cost=1804.89..1806.00 rows=7 width=8)
                             Hash Cond: (partsupp_1.ps_suppkey = supplier_1.s_suppkey)
                             ->  Index Scan using idx_partsupp_partkey on partsupp partsupp_1  (cost=0.49..1.48 rows=37 width=16)
                                   Index Cond: (part.p_partkey = ps_partkey)
                             ->  Hash  (cost=861.87..861.87 rows=400000 width=8)
                                   ->  Gather  (cost=858.16..861.87 rows=400000 width=8)
                                         Workers Planned: 28
                                         ->  Parallel Hash Join  (cost=858.16..861.87 rows=14286 width=8)
                                               Hash Cond: (nation_1.n_nationkey = supplier_1.s_nationkey)
                                               ->  Parallel Hash Join  (cost=0.02..0.03 rows=1 width=8)
                                                     Hash Cond: (nation_1.n_regionkey = region_1.r_regionkey)
                                                     ->  Parallel Seq Scan on nation nation_1  (cost=0.00..0.01 rows=1 width=16)
                                                     ->  Parallel Hash  (cost=0.01..0.01 rows=1 width=8)
                                                           ->  Parallel Seq Scan on region region_1  (cost=0.00..0.01 rows=1 width=8)
                                                                 Filter: (r_name = 'ASIA'::bpchar)
                                               ->  Parallel Hash  (cost=689.83..689.83 rows=71429 width=16)
                                                     ->  Parallel Seq Scan on supplier supplier_1  (cost=0.00..689.83 rows=71429 width=16)
 JIT:
   Functions: 98
   Inlining: true
   Optimization: true
(49 rows)

更新

Q2 建议加这个索引, 使用nestloop join

create index idx on partsupp (ps_partkey,ps_suppkey,ps_supplycost);
set enable_nestloop=on;

or

/*+ Set(enable_nestloop on) */ select                                                                                                           
        s_acctbal,
        s_name,   
        n_name,
        p_partkey,
        p_mfgr,   
        s_address,
        s_phone,  
        s_comment
from             
        part,
        supplier,
        partsupp,
        nation,  
        region 
where         
        p_partkey = ps_partkey
        and s_suppkey = ps_suppkey
        and p_size = 47           
        and p_type like '%NICKEL'
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'ASIA'          
        and ps_supplycost = (
                select       
                        min(ps_supplycost)
                from                      
                        partsupp,
                        supplier,
                        nation,  
                        region 
                where         
                        p_partkey = ps_partkey
                        and s_suppkey = ps_suppkey
                        and s_nationkey = n_nationkey
                        and n_regionkey = r_regionkey
                        and r_name = 'ASIA'          
        )                                  
order by 
        s_acctbal desc,
        n_name,        
        s_name,
        p_partkey
LIMIT 100;
   
Time: 5345.093 ms (00:05.345)

                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2807136.16..2807136.16 rows=1 width=198)
   ->  Sort  (cost=2807136.16..2807136.16 rows=1 width=198)
         Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
         ->  Nested Loop  (cost=17.90..2807136.15 rows=1 width=198)
               Join Filter: (nation.n_nationkey = supplier.s_nationkey)
               ->  Nested Loop  (cost=17.88..2807135.97 rows=6 width=180)
                     ->  Nested Loop  (cost=17.51..2807133.70 rows=6 width=42)
                           ->  Index Scan using idx_part__6 on part  (cost=0.49..15906.04 rows=160623 width=34)
                                 Index Cond: (p_size = 47)
                                 Filter: ((p_type)::text ~~ '%NICKEL'::text)
                           ->  Index Only Scan using idx on partsupp  (cost=17.02..17.38 rows=1 width=24)
                                 Index Cond: ((ps_partkey = part.p_partkey) AND (ps_supplycost = (SubPlan 1)))
                                 SubPlan 1
                                   ->  Aggregate  (cost=16.52..16.52 rows=1 width=8)
                                         ->  Hash Join  (cost=0.91..16.51 rows=8 width=8)
                                               Hash Cond: (supplier_1.s_nationkey = nation_1.n_nationkey)
                                               ->  Nested Loop  (cost=0.86..16.34 rows=38 width=16)
                                                     ->  Index Scan using idx_partsupp_partkey on partsupp partsupp_1  (cost=0.49..0.78 rows=38 width=16)
                                                           Index Cond: (part.p_partkey = ps_partkey)
                                                     ->  Index Scan using supplier_pkey on supplier supplier_1  (cost=0.37..0.41 rows=1 width=16)
                                                           Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
                                               ->  Hash  (cost=0.03..0.03 rows=5 width=8)
                                                     ->  Gather  (cost=0.02..0.03 rows=5 width=8)
                                                           Workers Planned: 28
                                                           ->  Parallel Hash Join  (cost=0.02..0.03 rows=1 width=8)
                                                                 Hash Cond: (nation_1.n_regionkey = region_1.r_regionkey)
                                                                 ->  Parallel Seq Scan on nation nation_1  (cost=0.00..0.01 rows=1 width=16)
                                                                 ->  Parallel Hash  (cost=0.01..0.01 rows=1 width=8)
                                                                       ->  Parallel Seq Scan on region region_1  (cost=0.00..0.01 rows=1 width=8)
                                                                             Filter: (r_name = 'ASIA'::bpchar)
                     ->  Index Scan using supplier_pkey on supplier  (cost=0.37..0.38 rows=1 width=154)
                           Index Cond: (s_suppkey = partsupp.ps_suppkey)
               ->  Materialize  (cost=0.02..0.06 rows=5 width=34)
                     ->  Gather  (cost=0.02..0.03 rows=5 width=34)
                           Workers Planned: 28
                           ->  Parallel Hash Join  (cost=0.02..0.03 rows=1 width=34)
                                 Hash Cond: (nation.n_regionkey = region.r_regionkey)
                                 ->  Parallel Seq Scan on nation  (cost=0.00..0.01 rows=1 width=42)
                                 ->  Parallel Hash  (cost=0.01..0.01 rows=1 width=8)
                                       ->  Parallel Seq Scan on region  (cost=0.00..0.01 rows=1 width=8)
                                             Filter: (r_name = 'ASIA'::bpchar)
 JIT:
   Functions: 67
   Inlining: true
   Optimization: true
(45 rows)

Q4

create index idx1 on lineitem (l_orderkey) where l_commitdate < l_receiptdate;

Q17

/*+ Set(work_mem 12GB) Set(max_parallel_workers_per_gather 28) */ explain select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        lineitem,
        part,
        (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
        p_partkey = l_partkey
        and agg_partkey = l_partkey
        and p_brand = 'Brand#11'
        and p_container = 'SM PKG'
        and l_quantity < avg_quantity
LIMIT 1;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6130644.31..6130644.31 rows=1 width=8)
   ->  Aggregate  (cost=6130644.31..6130644.31 rows=1 width=8)
         ->  Hash Join  (cost=5647079.65..6129794.97 rows=391854 width=8)
               Hash Cond: (part.p_partkey = lineitem_1.l_partkey)
               Join Filter: (lineitem.l_quantity < (('0.2'::double precision * avg(lineitem_1.l_quantity))))
               ->  Gather  (cost=11834.33..491832.73 rows=1193799 width=32)
                     Workers Planned: 28
                     ->  Parallel Hash Join  (cost=11834.33..491832.73 rows=42636 width=32)
                           Hash Cond: (lineitem.l_partkey = part.p_partkey)
                           ->  Parallel Seq Scan on lineitem  (cost=0.00..382461.00 rows=42857385 width=24)
                           ->  Parallel Hash  (cost=11830.99..11830.99 rows=1421 width=8)
                                 ->  Parallel Bitmap Heap Scan on part  (cost=427.06..11830.99 rows=1421 width=8)
                                       Recheck Cond: ((p_container = 'SM PKG'::bpchar) AND (p_brand = 'Brand#11'::bpchar))
                                       ->  Bitmap Index Scan on idx_part_1  (cost=0.00..418.44 rows=39794 width=0)
                                             Index Cond: ((p_container = 'SM PKG'::bpchar) AND (p_brand = 'Brand#11'::bpchar))
               ->  Hash  (cost=5542429.84..5542429.84 rows=39389900 width=16)
                     ->  Finalize HashAggregate  (cost=5356798.88..5534991.39 rows=39389900 width=16)
                           Group Key: lineitem_1.l_partkey
                           ->  Gather  (cost=568246.52..575684.96 rows=1102917200 width=40)
                                 Workers Planned: 28
                                 ->  Partial HashAggregate  (cost=568246.52..575684.96 rows=39389900 width=40)
                                       Group Key: lineitem_1.l_partkey
                                       ->  Parallel Seq Scan on lineitem lineitem_1  (cost=0.00..382461.00 rows=42857385 width=16)
 JIT:
   Functions: 44
   Inlining: true
   Optimization: true
(27 rows)

Q18

/*+ Leading( ((lineitem (customer orders)) ANY_subquery) ) */  explain select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
from
        customer,
        orders,
        lineitem
where
        o_orderkey in (
                select
                        l_orderkey
                from
                        lineitem
                group by
                        l_orderkey having
                                sum(l_quantity) > 312
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
order by
        o_totalprice desc,
        o_orderdate
LIMIT 100;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5094012.87..5094013.09 rows=100 width=55)
   ->  Sort  (cost=5094012.87..5104543.55 rows=4858479 width=55)
         Sort Key: orders.o_totalprice DESC, orders.o_orderdate
         ->  HashAggregate  (cost=4932105.34..4933022.83 rows=4858479 width=55)
               Group Key: customer.c_custkey, orders.o_orderkey
               ->  Hash Join  (cost=1182913.25..4900513.30 rows=4858479 width=55)
                     Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
                     ->  Gather  (cost=160021.94..1146574.91 rows=1200006784 width=63)
                           Workers Planned: 28
                           ->  Parallel Hash Join  (cost=160021.94..1146574.91 rows=42857385 width=63)
                                 Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                                 ->  Parallel Seq Scan on lineitem  (cost=0.00..382461.00 rows=42857385 width=16)
                                 ->  Parallel Hash  (cost=134775.97..134775.97 rows=10714120 width=47)
                                       ->  Parallel Hash Join  (cost=108506.79..134775.97 rows=10714120 width=47)
                                             Hash Cond: (customer.c_custkey = orders.o_custkey)
                                             ->  Parallel Seq Scan on customer  (cost=0.00..11101.38 rows=1071468 width=27)
                                             ->  Parallel Hash  (cost=83260.82..83260.82 rows=10714120 width=28)
                                                   ->  Parallel Seq Scan on orders  (cost=0.00..83260.82 rows=10714120 width=28)
                     ->  Hash  (cost=1020029.33..1020029.33 rows=1214594 width=8)
                           ->  Finalize HashAggregate  (cost=1011214.02..1019799.96 rows=1214594 width=8)
                                 Group Key: lineitem_1.l_orderkey
                                 Filter: (sum(lineitem_1.l_quantity) > '312'::double precision)
                                 ->  Gather  (cost=568246.52..568934.61 rows=102025924 width=16)
                                       Workers Planned: 28
                                       ->  Partial HashAggregate  (cost=568246.52..568934.61 rows=3643783 width=16)
                                             Group Key: lineitem_1.l_orderkey
                                             ->  Parallel Seq Scan on lineitem lineitem_1  (cost=0.00..382461.00 rows=42857385 width=16)
(27 rows)

SF=200 pg 11 结果

2018-08-25 19:46:22 [1535197582] : running TPC-H benchmark
2018-08-25 19:46:22 [1535197582] : running queries defined in TPC-H benchmark
2018-08-25 19:46:22 [1535197582] :   running query 1
2018-08-25 19:46:22 [1535197582] : run explain
2018-08-25 19:46:22 [1535197582] : run the query on background
2018-08-25 19:46:40 [1535197600] :     query 1 finished OK (18 seconds)
2018-08-25 19:46:40 [1535197600] :   running query 2
2018-08-25 19:46:40 [1535197600] : run explain
2018-08-25 19:46:40 [1535197600] : run the query on background
2018-08-25 19:47:19 [1535197639] :     query 2 finished OK (38 seconds)
2018-08-25 19:47:19 [1535197639] :   running query 3
2018-08-25 19:47:19 [1535197639] : run explain
2018-08-25 19:47:21 [1535197641] : run the query on background
2018-08-25 19:47:46 [1535197666] :     query 3 finished OK (25 seconds)
2018-08-25 19:47:46 [1535197666] :   running query 4
2018-08-25 19:47:46 [1535197666] : run explain
2018-08-25 19:47:47 [1535197667] : run the query on background
2018-08-25 19:48:19 [1535197699] :     query 4 finished OK (32 seconds)
2018-08-25 19:48:19 [1535197699] :   running query 5
2018-08-25 19:48:19 [1535197699] : run explain
2018-08-25 19:48:19 [1535197699] : run the query on background
2018-08-25 19:49:16 [1535197756] :     query 5 finished OK (57 seconds)
2018-08-25 19:49:17 [1535197757] :   running query 6
2018-08-25 19:49:17 [1535197757] : run explain
2018-08-25 19:49:17 [1535197757] : run the query on background
2018-08-25 19:49:25 [1535197765] :     query 6 finished OK (8 seconds)
2018-08-25 19:49:25 [1535197765] :   running query 7
2018-08-25 19:49:25 [1535197765] : run explain
2018-08-25 19:49:25 [1535197765] : run the query on background
2018-08-25 19:50:17 [1535197817] :     query 7 finished OK (52 seconds)
2018-08-25 19:50:17 [1535197817] :   running query 8
2018-08-25 19:50:18 [1535197818] : run explain
2018-08-25 19:50:18 [1535197818] : run the query on background
2018-08-25 19:50:42 [1535197842] :     query 8 finished OK (24 seconds)
2018-08-25 19:50:42 [1535197842] :   running query 9
2018-08-25 19:50:42 [1535197842] : run explain
2018-08-25 19:50:42 [1535197842] : run the query on background
2018-08-25 19:51:49 [1535197909] :     query 9 finished OK (66 seconds)
2018-08-25 19:51:49 [1535197909] :   running query 10
2018-08-25 19:51:49 [1535197909] : run explain
2018-08-25 19:51:49 [1535197909] : run the query on background
2018-08-25 19:52:27 [1535197947] :     query 10 finished OK (38 seconds)
2018-08-25 19:52:27 [1535197947] :   running query 11
2018-08-25 19:52:27 [1535197947] : run explain
2018-08-25 19:52:28 [1535197948] : run the query on background
2018-08-25 19:52:52 [1535197972] :     query 11 finished OK (24 seconds)
2018-08-25 19:52:52 [1535197972] :   running query 12
2018-08-25 19:52:52 [1535197972] : run explain
2018-08-25 19:52:52 [1535197972] : run the query on background
2018-08-25 19:53:18 [1535197998] :     query 12 finished OK (26 seconds)
2018-08-25 19:53:18 [1535197998] :   running query 13
2018-08-25 19:53:18 [1535197998] : run explain
2018-08-25 19:53:19 [1535197999] : run the query on background
2018-08-25 19:54:58 [1535198098] :     query 13 finished OK (98 seconds)
2018-08-25 19:54:58 [1535198098] :   running query 14
2018-08-25 19:54:58 [1535198098] : run explain
2018-08-25 19:54:58 [1535198098] : run the query on background
2018-08-25 19:55:11 [1535198111] :     query 14 finished OK (13 seconds)
2018-08-25 19:55:11 [1535198111] :   running query 15
2018-08-25 19:55:11 [1535198111] : run explain
2018-08-25 19:55:11 [1535198111] : run the query on background
2018-08-25 19:56:10 [1535198170] :     query 15 finished OK (58 seconds)
2018-08-25 19:56:10 [1535198170] :   running query 16
2018-08-25 19:56:10 [1535198170] : run explain
2018-08-25 19:56:10 [1535198170] : run the query on background
2018-08-25 19:58:05 [1535198285] :     query 16 finished OK (114 seconds)
2018-08-25 19:58:05 [1535198285] :   running query 17
2018-08-25 19:58:05 [1535198285] : run explain
2018-08-25 19:58:07 [1535198287] : run the query on background
2018-08-25 20:10:21 [1535199021] :     query 17 finished OK (732 seconds)
2018-08-25 20:10:21 [1535199021] :   running query 18
2018-08-25 20:10:21 [1535199021] : run explain
2018-08-25 20:10:22 [1535199022] : run the query on background
2018-08-25 20:20:19 [1535199619] :     query 18 finished OK (595 seconds)
2018-08-25 20:20:19 [1535199619] :   running query 19
2018-08-25 20:20:19 [1535199619] : run explain
2018-08-25 20:20:19 [1535199619] : run the query on background
2018-08-25 20:20:31 [1535199631] :     query 19 finished OK (12 seconds)
2018-08-25 20:20:31 [1535199631] :   running query 20
2018-08-25 20:20:31 [1535199631] : run explain
2018-08-25 20:20:34 [1535199634] : run the query on background
2018-08-25 20:24:08 [1535199848] :     query 20 finished OK (213 seconds)
2018-08-25 20:24:08 [1535199848] :   running query 21
2018-08-25 20:24:08 [1535199848] : run explain
2018-08-25 20:24:08 [1535199848] : run the query on background
2018-08-25 20:26:13 [1535199973] :     query 21 finished OK (124 seconds)
2018-08-25 20:26:13 [1535199973] :   running query 22
2018-08-25 20:26:13 [1535199973] : run explain
2018-08-25 20:26:14 [1535199974] : run the query on background
2018-08-25 20:26:28 [1535199988] :     query 22 finished OK (14 seconds)
2018-08-25 20:26:28 [1535199988] : finished TPC-H benchmark

SF=200

数据库 q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 q21 q22
PostgreSQL 11beta3 18 38 25 32 57 8 52 24 66 38 24 26 98 13 58 114 732 595 12 213 124 14
DeepGreen v18 12 3 11 7 12 1 32 11 29 9 2 9 17 1 4 9 69 24 11 13 20 9

小结

PostgreSQL 11 parallel hash agg, parallel hash join的性能有了质的飞跃。

同时PostgreSQL 11在创建索引时,也支持了PARALLEL,创建索引的速度有了飞跃。

(SF=10)相比PostgreSQL 10,PostgreSQL 11版本AP能力得到了极大增强。在“没有使用列存储、没有使用分区表的”情况下,已接近Deepgreen(Greenplum原班人马)。ck, 田丰老师 .

PostgreSQL 11优化器还有可以加强的地方,在TPC-H的一些SQL中,执行计划并不是最好的,比如通过aqo提高优化器的能力。

https://github.com/postgrespro/aqo

PostgreSQL实现HTAP是必然的趋势。将来还可以结合citus插件实现sharding, (同时引入列存功能)。

参考

《TPC-H测试 - PostgreSQL 10 vs Deepgreen(Greenplum)》

https://github.com/digoal/gp_tpch

《PostgreSQL 10 + PostGIS + Sharding(pg_pathman) + MySQL(fdw外部表) on ECS 部署指南(适合新用户) - 珍藏级》

《PostgreSQL on ECS多云盘的部署、快照备份和恢复》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
6月前
|
存储 NoSQL 关系型数据库
PostgreSQL列存扩展hydra简单测试
Hydra是一款PostgreSQL的扩展,为PostgreSQL增加了列存引擎,使得PostgreSQL的olap性能大幅提升,本文介绍Hydra基本的使用方法。
674 0
|
SQL 弹性计算 关系型数据库
HTAP数据库 PostgreSQL 场景与性能测试之 3.1 - (OLAP) 大表JOIN统计查询-10亿 join 1亿 agg
标签 PostgreSQL , HTAP , OLTP , OLAP , 场景与性能测试 背景 PostgreSQL是一个历史悠久的数据库,历史可以追溯到1973年,最早由2014计算机图灵奖得主,关系数据库的鼻祖Michael_Stonebraker 操刀设计,PostgreSQL具备与Oracle类似的功能、性能、架构以及稳定性。 PostgreSQL社区的贡献者众多
1809 0
|
1月前
|
SQL 关系型数据库 OLAP
PostgreSQL从小白到高手教程 - 第46讲:poc-tpch测试
PostgreSQL从小白到高手教程 - 第46讲:poc-tpch测试
83 3
|
2月前
|
关系型数据库 OLAP OLTP
PostgreSQL从小白到高手教程 - 第45讲:poc-tpcc测试
CUUG PostgreSQL技术大讲堂系列公开课第45讲-POC-TPCC测试的内容,往期视频及文档,请联系CUUG。
43 1
|
4月前
|
SQL 关系型数据库 PostgreSQL
PostgreSQL【部署 01】离线安装PostgreSQL+PostGIS踩坑及问题解决经验分享(含安装文件PostgreSQL+PostGIS及多个依赖+测试SQL)
PostgreSQL【部署 01】离线安装PostgreSQL+PostGIS踩坑及问题解决经验分享(含安装文件PostgreSQL+PostGIS及多个依赖+测试SQL)
109 0
|
4月前
|
SQL 关系型数据库 测试技术
postgresql|数据库|数据库测试工具pgbench之使用
postgresql|数据库|数据库测试工具pgbench之使用
79 0
|
4月前
|
关系型数据库 数据库 PostgreSQL
postgresql | 数据库| 生成2000W条的简单测试表
postgresql | 数据库| 生成2000W条的简单测试表
24 0
|
7月前
|
弹性计算 关系型数据库 OLAP
快速上手并跑通AnalyticDB PostgreSQL版TPC-H测试
本案例介绍如何创建AnalyticDB PostgreSQL实例、生成测试数据、建表、导入数据,并跑通TPC-H查询。
334 0
|
7月前
|
关系型数据库 测试技术 OLTP
沉浸式学习PostgreSQL|PolarDB 12: 如何快速构建 海量 逼真 测试数据
本文主要教大家怎么用好数据库, 而不是怎么运维管理数据库、怎么开发数据库内核.
258 0
|
11月前
|
关系型数据库 PostgreSQL

相关产品

  • 云原生数据库 PolarDB