如何使用AnalyticDB for PostgreSQL 6.0 进行TPC-H 1TB数据的测试

简介: TPC-H 主要目的是评价特定查询的决策支持能力,该基准模拟了决策支持系统中的数据库操作,测试数据库系统复杂查询的响应时间。本文详细介绍了在阿里云AnalyticDB for PostgreSQL 6.0版,测试TPC-H benchmark 1TB数据的流程和测试结果

TPC-H 简介

TPC-H是事务处理性能委员会( Transaction ProcessingPerformance Council )制定的基准程序之一,TPC-H 主要目的是评价特定查询的决策支持能力,该基准模拟了决策支持系统中的数据库操作,测试数据库系统复杂查询的响应时间。
TPC-H 里的表是模拟一个配件销售管理系统进行建模。详情参考 TPCH Specification

8张表逻辑关系

tpch_

测试数据量的说明

数据量的大小对查询速度有直接的影响,TPC-H 中使用SF描述数据量,1SF 对应1GB 单位。1000SF,即1TB。1SF对应的数据量只是8个表的总数据量不包括索引等空间占用,准备数据时需预留更多空间。

AnalyticDB for PostgreSQL 6.0 规格选择

选择性价比适中的两种规格:
• ADB PG 6.0 SSD存储+单节点1核+实例节点数64
• ADB PG 6.0 SSD存储+单节点4核+实例节点数32
• 后续的测试过程均使用默认配置,未针对TPC-H的查询SQL做特别的修改

测试步骤

开通一个ECS实例

准备一台ECS(建议规格:ecs.g6.4xlarge规格、CentOS系统、ESSD 2T数据盘,建议与AnalyticDB for PostgreSQL 6.0 实例用相同region和VPC网络),用于1T数据生成、数据上传/入库、客户端测试。

开通一个AnalyticDB for PostgreSQL 6.0 实例

参考配置如下图。建议与ECS实例用相同区域和VPC网络。adbpg_

开通外网,修改白名单,创建数据库账号

进入阿里云分析型数据库PostgreSQL产品页,进入分析型数据库PostgreSQL版控制台,找到已开通的AnalyticDB for PostgreSQL 6.0 实例,点击“实例名链接”进入详情页,参考下图位置,修改配置项。_

生成TPC-H 1T数据

  • SSH进入到ECS实例,下载TPC-H dbgen代码,编译后dbgen目录生成 dbgen/qgen 执行程序。
git clone https://github.com/gregrahn/tpch-kit.git
cd tpch-kit/dbgen
make
  • 生成1T数据,运行 ./dbgen --help 查看如何生成,命令参考:
    ./dbgen -vf -s 1000
  • 也可以并行生成分片文件,如下shell脚本参考(10个分片文件):
for((i=1;i<=10;i++));
do
./dbgen -s 1000 -S $i -C 10 -f &
done
  • 处理生成的 tbl 文件,tbl文件每行最后会多1个'|',可以用seed命令将每行后面的'|'去掉,shell脚本参考:
sed -i 's/.$//' ./region.tbl &
sed -i 's/.$//' ./nation.tbl &
for((i=1;i<=10;i++));
do
sed -i 's/.$//' ./lineitem.tbl.$i &
sed -i 's/.$//' ./orders.tbl.$i &
sed -i 's/.$//' ./customer.tbl.$i &
sed -i 's/.$//' ./partsupp.tbl.$i &
sed -i 's/.$//' ./part.tbl.$i &
sed -i 's/.$//' ./supplier.tbl.$i &
done

向数据库中建表

在ECS机器上检查是否存在PSQL命令,如果没有,安装PSQL客户端:
sudo yum install postgresql

准备TPC-H涉及到的8张表创建SQL,建表语句参考如下。
列存表适合向量计算、JIT架构。对大批量数据的访问和统计,效率更高。因此建表语句中使用了
• AO列存表
• 不开压缩
• 设置复制表

create table nation (
    n_nationkey  integer not null,
    n_name       char(25) not null,
    n_regionkey  integer not null,
    n_comment    varchar(152)
)
with (appendonly=true, orientation=column)
distributed REPLICATED;
create table region (
    r_regionkey  integer not null,
    r_name       char(25) not null,
    r_comment    varchar(152)
)
with (appendonly=true, orientation=column)
distributed REPLICATED;
create table part (
    p_partkey     integer not null,
    p_name        varchar(55) not null,
    p_mfgr        char(25) not null,
    p_brand       char(10) not null,
    p_type        varchar(25) not null,
    p_size        integer not null,
    p_container   char(10) not null,
    p_retailprice DECIMAL(15,2) not null,
    p_comment     varchar(23) not null
)
with (appendonly=true, orientation=column)
distributed by (p_partkey);
create table supplier (
    s_suppkey     integer not null,
    s_name        char(25) not null,
    s_address     varchar(40) not null,
    s_nationkey   integer not null,
    s_phone       char(15) not null,
    s_acctbal     DECIMAL(15,2) not null,
    s_comment     varchar(101) not null
)
with (appendonly=true, orientation=column)
distributed by (s_suppkey);
create table partsupp (
    ps_partkey     integer not null,
    ps_suppkey     integer not null,
    ps_availqty    integer not null,
    ps_supplycost  DECIMAL(15,2)  not null,
    ps_comment     varchar(199) not null
)
with (appendonly=true, orientation=column)
distributed by (ps_partkey);
create table customer (
    c_custkey     integer not null,
    c_name        varchar(25) not null,
    c_address     varchar(40) not null,
    c_nationkey   integer not null,
    c_phone       char(15) not null,
    c_acctbal     DECIMAL(15,2)  not null,
    c_mktsegment  char(10) not null,
    c_comment     varchar(117) not null
)
with (appendonly=true, orientation=column)
distributed by (c_custkey);
create table orders (
    o_orderkey       bigint not null,
    o_custkey        integer not null,
    o_orderstatus    char(1) not null,
    o_totalprice     DECIMAL(15,2) not null,
    o_orderdate      date not null,
    o_orderpriority  char(15) not null,
    o_clerk          char(15) not null,
    o_shippriority   integer not null,
    o_comment        varchar(79) not null
)
with (appendonly=true, orientation=column)
distributed by (o_orderkey);
create table lineitem (
    l_orderkey    bigint not null,
    l_partkey     integer not null,
    l_suppkey     integer not null,
    l_linenumber  integer not null,
    l_quantity    DECIMAL(15,2) not null,
    l_extendedprice  DECIMAL(15,2) not null,
    l_discount    DECIMAL(15,2) not null,
    l_tax         DECIMAL(15,2) not null,
    l_returnflag  char(1) not null,
    l_linestatus  char(1) not null,
    l_shipdate    date not null,
    l_commitdate  date not null,
    l_receiptdate date not null,
    l_shipinstruct char(25) not null,
    l_shipmode     char(10) not null,
    l_comment      varchar(44) not null
)
with (appendonly=true, orientation=column)
distributed by (l_orderkey);

运行SQL脚本文件的shell脚本参考:

export PGPASSWORD=<数据库账号密码>
psql -h <ADB PG实例内网或外网地址> -p 3432 -U <数据库账号> -f <创建表的SQL脚本文件路径> 

导入数据

准备工作就绪,可以开始导入数据了,导入数据有两种方式:

  1. 通过 copy from导入
  2. 通过OSS外表方式导入

下面分别介绍两种导入方法

COPY方式导入

SQL脚本参考:

\copy nation from '/data/tpch_1t/nation.tbl' DELIMITER '|';
\copy region from '/data/tpch_1t/region.tbl' DELIMITER '|';
\copy supplier from '/data/tpch_1t/supplier.tbl' DELIMITER '|';

tbl路径以实际路径为准,导入shell脚本参考创建表的shell脚本(或psql进入数据库执行)。为提高导入效率(ECS网络带宽保障),可以把SQL拆开并发导入。

使用OSS外表方式导入数据

将生成的数据文件上传到oss

./ossutil64 cp -r <tbl文件目录> oss://<oss bucket>/<目录>/ 
               -i <AccessKey ID> -k <Access Key Secret> 
               -e <EndPoint>

使用AnalyticDB for PostgreSQL 6.0的OSS外表进行TPCH测试数据导入,已在北京可用区的OSS上准备好了可以使用OSS外表的数据,且为公共可读权限。用户无需再进行数据生成,然后再导入数据到OSS。
OSS外表文档参考: OSS数据导入AnalyticDB for PostgreSQL

创建OSS外部表的建表语句示例(含OSS地址)

create readable external table ext_nation ( n_nationkey int, n_name varchar(25), n_regionkey integer, 
  n_comment varchar(152)) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/nation.tbl 
    id=$AccessKey key=$AccessKeySecret 
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;


create readable external table ext_region ( R_REGIONKEY int, R_NAME CHAR(25),R_COMMENT VARCHAR(152)) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/region.tbl 
    id=$AccessKey key=$AccessKeySecret 
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_lineitem ( l_orderkey bigint, l_partkey bigint, l_suppkey bigint, 
  l_linenumber bigint, l_quantity double precision, l_extendedprice double precision, 
  l_discount double precision, l_tax double precision, l_returnflag CHAR(1), 
  l_linestatus CHAR(1), l_shipdate DATE, l_commitdate DATE, l_receiptdate DATE, 
  l_shipinstruct CHAR(25), l_shipmode CHAR(10), l_comment VARCHAR(44)) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/lineitem.tbl 
    id= $AccessKey key= $AccessKeySecret 
    bucket=oss-y ') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_orders ( o_orderkey bigint , o_custkey bigint , o_orderstatus CHAR(1) , 
  o_totalprice double precision, o_orderdate DATE , o_orderpriority CHAR(15) , o_clerk CHAR(15) , 
  o_shippriority bigint , o_comment VARCHAR(79) ) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/orders.tbl 
    id=$AccessKey key=$AccessKeySecret  
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_part ( p_partkey bigint , p_name VARCHAR(55) , p_mfgr CHAR(25) , 
  p_brand CHAR(10) , p_type VARCHAR(25) , p_size bigint , p_container CHAR(10) , 
  p_retailprice double precision , p_comment VARCHAR(23) ) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/part.tbl 
    id= $AccessKey key= $AccessKeySecret 
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_partsupp ( ps_partkey bigint , ps_suppkey bigint , 
  ps_availqty bigint , ps_supplycost double precision , ps_comment VARCHAR(199) ) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/partsupp.tbl 
    id= $AccessKey key= $AccessKeySecret 
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_supplier ( s_suppkey bigint , s_name CHAR(25) , 
  s_address VARCHAR(40) , s_nationkey bigint , s_phone CHAR(15) , s_acctbal DECIMAL(15,2) , 
  s_comment VARCHAR(101) ) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/supplier.tbl 
    id= $AccessKey key= $AccessKeySecret
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

CREATE readable external TABLE ext_customer ( c_custkey bigint , c_name VARCHAR(25) , 
  c_address VARCHAR(40) , c_nationkey bigint , c_phone CHAR(15) , c_acctbal double precision , 
  c_mktsegment CHAR(10) , c_comment VARCHAR(117) ) 
    location('oss://oss-cn-beijing.aliyuncs.com 
    filepath=data/tpch_data_1000x/customer.tbl 
    id= $AccessKey key= $AccessKeySecret
    bucket=oss-y') FORMAT 'TEXT' (DELIMITER '|' ) ;

从OSS外表写入TPC-H数据到AnalyticDB for PostgreSQL

insert into nation select * from ext_nation;
insert into region select * from ext_region;
insert into lineitem select * from ext_lineitem;
insert into orders select * from ext_orders;
insert into customer select * from ext_customer;
insert into part select * from ext_part;
insert into partsupp select * from ext_partsupp;
insert into supplier select * from ext_supplier;

至此数据导入完毕,进入查询执行阶段

收集统计信息

analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;

执行查询

使用如下shell脚本测试,也可以通过psql等其他客户端逐条执行查询SQL。具体的22条SQL语句见本文最后。

查询加速

特别的,AnalyticDB for PostgreSQL 6.0 的向量加速引擎,可以提升查询性能1倍左右。
在session级别, 修改GUC变量enable_odyssey 为on ,可开启加速引擎。
set enable_odyssey = on;
如需关闭加速引擎,设置该参数为off。
set enable_odyssey = off;
如果使用如下脚本执行22条TPCH SQL,需要在每个Query文件开始出增加一行,set enable_odyssey = on;

total_cost=0

for i in {1..22}
do
        echo "begin run Q${i}, query/q$i.sql , `date`"
        begin_time=`date +%s.%N`
        #psql -h ${实例连接地址} -p ${端口号} -U ${数据库用户} -f query/q${i}.sql > ./log/log_q${i}.out
        rc=$?
        end_time=`date +%s.%N`
        cost=`echo "$end_time-$begin_time"|bc`
        total_cost=`echo "$total_cost+$cost"|bc`
        if [ $rc -ne 0 ] ; then
              printf "run Q%s fail, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         else
              printf "run Q%s succ, cost: %.2f, totalCost: %.2f, `date`\n" $i $cost $total_cost
         fi
done

测试结果

各表数据量说明,1TB数据(不含索引)

表名 数据条目数
customer 15000w
lineitem 600000w
nation 25
orders 150000w
part 20000w
partsupp 80000w
region 5
supplier 1000w

执行时间统计

SQL 1core * 64节点 SSD型(RT:秒) 4core * 32节点 SSD型(RT:秒) 4core * 32节点 SSD型 - 开启向量加速引擎(RT:秒)
Total 3703.56 2534.29 1258.24

22个SQL

-- Q1
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '93 day'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

-- Q2
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 23
    and p_type like '%STEEL'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = (
        select
            min(ps_supplycost)
        from
            partsupp,
            supplier,
            nation,
            region
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
limit 100;

-- Q3
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'MACHINERY'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-24'
    and l_shipdate > date '1995-03-24'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
limit 10;

-- Q4
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1996-08-01'
    and o_orderdate < date '1996-08-01' + interval '3' month
    and exists (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;

-- Q5
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'MIDDLE EAST'
    and o_orderdate >= date '1994-01-01'
    and o_orderdate < date '1994-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc;

-- Q6
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    and l_discount between 0.06 - 0.01 and 0.06 + 0.01
    and l_quantity < 24;

-- Q7
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and (
                (n1.n_name = 'JORDAN' and n2.n_name = 'INDONESIA')
                or (n1.n_name = 'INDONESIA' and n2.n_name = 'JORDAN')
            )
            and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

-- Q8
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    o_year,
    sum(case
        when nation = 'INDONESIA' then volume
        else 0
    end) / sum(volume) as mkt_share
from
    (
        select
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) as volume,
            n2.n_name as nation
        from
            part,
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2,
            region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = 'ASIA'
            and s_nationkey = n2.n_nationkey
            and o_orderdate between date '1995-01-01' and date '1996-12-31'
            and p_type = 'STANDARD BRUSHED BRASS'
    ) as all_nations
group by
    o_year
order by
    o_year;

-- Q9
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    nation,
    o_year,
    sum(amount) as sum_profit
from
    (
        select
            n_name as nation,
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
        from
            part,
            supplier,
            lineitem,
            partsupp,
            orders,
            nation
        where
            s_suppkey = l_suppkey
            and ps_suppkey = l_suppkey
            and ps_partkey = l_partkey
            and p_partkey = l_partkey
            and o_orderkey = l_orderkey
            and s_nationkey = n_nationkey
            and p_name like '%chartreuse%'
    ) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc;

-- Q10
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1994-08-01'
    and o_orderdate < date '1994-08-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
limit 20;

-- Q11
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'INDONESIA'
group by
    ps_partkey having
        sum(ps_supplycost * ps_availqty) > (
            select
                sum(ps_supplycost * ps_availqty) * 0.0001000000
            from
                partsupp,
                supplier,
                nation
            where
                ps_suppkey = s_suppkey
                and s_nationkey = n_nationkey
                and n_name = 'INDONESIA'
        )
order by
    value desc;

-- Q12
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT'
            or o_orderpriority = '2-HIGH'
            then 1
        else 0
    end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT'
            and o_orderpriority <> '2-HIGH'
            then 1
        else 0
    end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('REG AIR', 'TRUCK')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode;

-- Q13
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                c_custkey = o_custkey
                and o_comment not like '%pending%requests%'
        group by
            c_custkey
    ) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc;

-- Q14
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    100.00 * sum(case
        when p_type like 'PROMO%'
            then l_extendedprice * (1 - l_discount)
        else 0
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1994-11-01'
    and l_shipdate < date '1994-11-01' + interval '1' month;

-- Q15
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
create view revenue0 (supplier_no, total_revenue) as
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '1997-10-01'
        and l_shipdate < date '1997-10-01' + interval '3' month
    group by
        l_suppkey;
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey;
drop view revenue0;

-- Q16
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#44'
    and p_type not like 'SMALL BURNISHED%'
    and p_size in (36, 27, 34, 45, 11, 6, 25, 16)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

-- Q17
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = 'Brand#42'
    and p_container = 'JUMBO PACK'
    and l_quantity < (
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
    );

-- Q18
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 312
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

-- Q19
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#43'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 5 and l_quantity <= 5 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#45'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 12 and l_quantity <= 12 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#11'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 24 and l_quantity <= 24 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

-- Q20
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    s_name,
    s_address
from
    supplier,
    nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'magenta%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1996-01-01'
                    and l_shipdate < date '1996-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'RUSSIA'
order by
    s_name;

-- Q21
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'MOZAMBIQUE'
group by
    s_name
order by
    numwait desc,
    s_name
limit 100;

-- Q22
-- 开启向量加速引擎,并设置开关变量为on
set enable_odyssey = on;
select
        cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
from
        (
                select
                        substring(c_phone from 1 for 2) as cntrycode,
                        c_acctbal
                from
                        customer
                where
                        substring(c_phone from 1 for 2) in
                                ('13', '31', '23', '29', '30', '18', '17')
                        and c_acctbal > (
                                select
                                        avg(c_acctbal)
                                from
                                        customer
                                where
                                        c_acctbal > 0.00
                                        and substring(c_phone from 1 for 2) in
                                                ('13', '31', '23', '29', '30', '18', '17')
                        )
                        and not exists (
                                select
                                        *
                                from
                                        orders
                                where
                                        o_custkey = c_custkey
                        )
        ) as custsale
group by
        cntrycode
order by
        cntrycode;
        
相关实践学习
数据库实验室挑战任务-初级任务
本场景介绍如何开通属于你的免费云数据库,在RDS-MySQL中完成对学生成绩的详情查询,执行指定类型SQL。
阿里云云原生数据仓库AnalyticDB MySQL版 使用教程
云原生数据仓库AnalyticDB MySQL版是一种支持高并发低延时查询的新一代云原生数据仓库,高度兼容MySQL协议以及SQL:92、SQL:99、SQL:2003标准,可以对海量数据进行即时的多维分析透视和业务探索,快速构建企业云上数据仓库。 了解产品 https://www.aliyun.com/product/ApsaraDB/ads
目录
相关文章
|
1月前
|
存储 关系型数据库 分布式数据库
PolarDB常见问题之PolarDB冷存数据到OSS之后恢复失败如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
1月前
|
SQL 关系型数据库 分布式数据库
在PolarDB中,行数评估是通过对表的统计数据、基数估计以及算子代价模型来进行估算的。
【2月更文挑战第14天】在PolarDB中,行数评估是通过对表的统计数据、基数估计以及算子代价模型来进行估算的。
84 1
|
6天前
|
SQL 关系型数据库 MySQL
关系型数据库插入数据的语句
使用SQL的`INSERT INTO`语句向关系型数据库的`students`表插入数据。例如,插入一个`id`为1,`name`为&#39;张三&#39;,`age`为20的记录:`INSERT INTO students (id, name, age) VALUES (1, &#39;张三&#39;, 20)。如果`id`自增,则可简化为`INSERT INTO students (name, age) VALUES (&#39;张三&#39;, 20)`。
6 2
|
6天前
|
SQL 存储 Oracle
关系型数据库查询数据的语句
本文介绍了关系型数据库中的基本SQL查询语句,包括选择所有或特定列、带条件查询、排序、分组、过滤分组、表连接、限制记录数及子查询。SQL还支持窗口函数、存储过程等高级功能,是高效管理数据库的关键。建议深入学习SQL及相应数据库系统文档。
9 2
|
13天前
|
人工智能 Cloud Native 算法
数据之势丨AI时代,云原生数据库的最新发展趋势与进展
AI与云数据库的深度结合是数据库发展的必然趋势,基于AI能力的加持,云数据库未来可以实现更快速的查询和决策,帮助企业更好地利用海量数据进行业务创新和决策优化。
数据之势丨AI时代,云原生数据库的最新发展趋势与进展
|
30天前
|
关系型数据库 MySQL OLAP
PolarDB +AnalyticDB Zero-ETL :免费同步数据到ADB,享受数据流通新体验
Zero-ETL是阿里云瑶池数据库提供的服务,旨在简化传统ETL流程的复杂性和成本,提高数据实时性。降低数据同步成本,允许用户快速在AnalyticDB中对PolarDB数据进行分析,降低了30%的数据接入成本,提升了60%的建仓效率。 Zero-ETL特性包括免费的PolarDB MySQL联邦分析和PolarDB-X元数据自动同步,提供一体化的事务处理和数据分析,并能整合多个数据源。用户只需简单配置即可实现数据同步和实时分析。
|
1月前
|
SQL 关系型数据库 OLAP
PostgreSQL从小白到高手教程 - 第46讲:poc-tpch测试
PostgreSQL从小白到高手教程 - 第46讲:poc-tpch测试
83 3
|
1月前
|
SQL 数据采集 存储
数据仓库(12)数据治理之数仓数据管理实践心得
这边文章聊聊自己对数据治理开发实践的一些思路,就是聊聊怎么开始去做数据治理这件事情。说起数据治理,有时候虽然看了很多文章,看了很多的介绍,了解数据治理的理论,但是实际上需要我们去搞的时候,就会踩很多的坑。这里记一下自己做数据治理的一些思路,做做笔记,也分享给需要的同学。 当然,想要做数据治理,想要学习了解,一下数据治理的范围,理论等,最好可以看看别人怎么做的,了解数据治理可以参考:[数据仓库(11)什么是大数据治理,数据治理的范围是哪些](https://zhuanlan.zhihu.com/p/467433967)。
232 0
|
1月前
|
存储 大数据 数据管理
数据仓库(09)数仓缓慢变化维度数据的处理
数据仓库的重要特点之一是反映历史变化,所以如何处理维度的变化是维度设计的重要工作之一。缓慢变化维的提出是因为在现实世界中,维度的属性并不是静态的,它会随着时间的流逝发生缓慢的变化,与数据增长较为快速的事实表相比,维度变化相对缓慢。阴齿这个就叫做缓慢变化维。
211 2
数据仓库(09)数仓缓慢变化维度数据的处理
|
2月前
|
关系型数据库 OLAP OLTP
PostgreSQL从小白到高手教程 - 第45讲:poc-tpcc测试
CUUG PostgreSQL技术大讲堂系列公开课第45讲-POC-TPCC测试的内容,往期视频及文档,请联系CUUG。
45 1

热门文章

最新文章

相关产品

  • 云数据库 RDS PostgreSQL 版