PostgreSQL 11 新特性解读: 分区表增加哈希分区

2018-11-27 4999

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

简介：

PostgreSQL 11 的一个重量级新特性为分区表得到较大增强，例如支持哈希分区(HASH)表，因此 PostgreSQL 支持范围分区(RANGE)、列表分区(LIST)、>哈希分区(HASH)三种分区方式，本文简单演示下哈希分区表。

Hash Partitioning

The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

Hash分区表的分区定义包含两个属性，如下：

modulus: 指Hash分区个数。
remainder: 指Hash分区键取模余。

创建分区表语法

CREATE TABLE table_name (  ...  )
[ PARTITION BY { RANGE | LIST | HASH }  (  { column_name |  ( expression )  }
 CREATE TABLE table_name
PARTITION OF parent_table [  (
)  ] FOR VALUES partition_bound_spec

创建数据生成函数

为了便于生成测试数据，创建以下两个函数用来随机生成指定长度的字符串，创建 random_range(int4, int4) 函数如下：

CREATE OR REPLACE FUNCTION random_range(int4, int4)
RETURNS int4
LANGUAGE SQL
AS 
$$

    SELECT ($1 + FLOOR(($2 - $1 + 1) * random() ))::int4;

$$
;

接着创建random_text_simple(length int4)函数，此函数会调用random_range(int4, int4)函数。

CREATE OR REPLACE FUNCTION random_text_simple(length int4)
RETURNS text
LANGUAGE PLPGSQL
AS 
$$

DECLARE
    possible_chars text := '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
    output text := '';
    i int4;
    pos int4;
BEGIN

    FOR i IN 1..length LOOP
        pos := random_range(1, length(possible_chars));
        output := output || substr(possible_chars, pos, 1);
    END LOOP;

    RETURN output;
END;

$$
;

random_text_simple(length int4)函数可以随机生成指定长度字符串，如下随机生成含三位字符的字符串。

mydb=> SELECT random_text_simple(3);
 random_text_simple 
--------------------
 LL9
(1 row)

随机生成含六位字符的字符串，如下所示：

mydb=> SELECT random_text_simple(6);
 random_text_simple 
--------------------
 B81BPW
(1 row)

后面会用到这个函数生成测试数据。

创建哈希分区父表

CREATE TABLE student (
 stuname text ,
 ctime   timestamp(6) without time zone
) PARTITION BY HASH(stuname);

创建索引

CREATE INDEX idx_stuendt_stuname on student using btree(stuname);

创建子表

CREATE TABLE student_p0 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 0);
CREATE TABLE student_p1 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 1);
CREATE TABLE student_p2 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 2);
CREATE TABLE student_p3 PARTITION OF student FOR VALUES WITH(MODULUS 4, REMAINDER 3);

查看分区表定义

francs=> \d+ student
                                              Table "francs.student"
 Column  |              Type              | Collation | Nullable | Default | Storage  | Stats target | Description 
---------+--------------------------------+-----------+----------+---------+----------+--------------+-------------
 stuname | text                           |           |          |         | extended |              | 
 ctime   | timestamp(6) without time zone |           |          |         | plain    |              | 
Partition key: HASH (stuname)
Indexes:
    "idx_stuendt_stuname" btree (stuname)
Partitions: student_p0 FOR VALUES WITH (modulus 4, remainder 0),
            student_p1 FOR VALUES WITH (modulus 4, remainder 1),
            student_p2 FOR VALUES WITH (modulus 4, remainder 2),
            student_p3 FOR VALUES WITH (modulus 4, remainder 3)

从以上看出表 student 和它的四个分区。

插入测试数据

使用之前创建的函数 random_text_simple() 生成100万测试数据，如下。

INSERT INTO student(stuname,ctime) SELECT random_text_simple(6),clock_timestamp() FROM generate_series(1,1000000);

查看分区表数据

表数据如下

francs=> SELECT * FROM student LIMIT 3;
 stuname |        ctime        
---------+---------------------
 4JJOPN  | 2018-09-20 10:45:06
 NHQONC  | 2018-09-20 10:45:06
 8V5BGH  | 2018-09-20 10:45:06
(3 rows)

统计分区数据量

francs=> SELECT tableoid::regclass,count(*) from student group by 1 order by 1;
  tableoid  | count  
------------+--------
 student_p0 | 250510
 student_p1 | 249448
 student_p2 | 249620
 student_p3 | 250422
(4 rows)

可见数据均匀分布到了四个分区。

根据分区键查询

francs=> EXPLAIN ANALYZE SELECT * FROM student WHERE stuname='3LXBEV';
                                                                QUERY PLAN                                                          
      
------------------------------------------------------------------------------------------------------------------------------------

 Append  (cost=0.42..8.44 rows=1 width=15) (actual time=0.017..0.018 rows=1 loops=1)
   ->  Index Scan using student_p3_stuname_idx on student_p3  (cost=0.42..8.44 rows=1 width=15) (actual time=0.017..0.017 rows=1 loops=1)
         Index Cond: (stuname = '3LXBEV'::text)
 Planning Time: 0.198 ms
 Execution Time: 0.042 ms
(5 rows)

根据分区键stuname查询仅扫描分区 student_p3，并走了索引。

根据非分区键查询

francs=> EXPLAIN ANALYZE SELECT * FROM student WHERE ctime='2018-09-20 10:53:55.48392';
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13761.36 rows=4 width=15) (actual time=37.891..39.183 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..12760.96 rows=4 width=15) (actual time=23.753..35.006 rows=0 loops=3)
         ->  Parallel Seq Scan on student_p0  (cost=0.00..3196.99 rows=1 width=15) (actual time=0.014..28.550 rows=1 loops=1)
               Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
               Rows Removed by Filter: 250509
         ->  Parallel Seq Scan on student_p3  (cost=0.00..3195.34 rows=1 width=15) (actual time=29.543..29.543 rows=0 loops=1)
               Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
               Rows Removed by Filter: 250422
         ->  Parallel Seq Scan on student_p2  (cost=0.00..3185.44 rows=1 width=15) (actual time=8.260..8.260 rows=0 loops=3)
               Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
               Rows Removed by Filter: 83207
         ->  Parallel Seq Scan on student_p1  (cost=0.00..3183.18 rows=1 width=15) (actual time=22.135..22.135 rows=0 loops=1)
               Filter: (ctime = '2018-09-20 10:53:55.48392'::timestamp without time zone)
               Rows Removed by Filter: 249448
 Planning Time: 0.183 ms
 Execution Time: 39.219 ms
(18 rows)

根据非分区键ctime查询扫描了分区表所有分区。

总结

本文演示了 PostgreSQL 哈希分区表的创建、测试数据的生成导入和查询计划，后面博客演示分区表增强的其它方面。

参考

新书推荐

最后推荐和张文升共同编写的《PostgreSQL实战》，本书基于PostgreSQL 10 编写，共18章，重点介绍SQL高级特性、并行查询、分区表、物理复制、逻辑复制、备份恢复、高可用、性能优化、PostGIS等，涵盖大量实战用例！

购买链接：https://item.jd.com/12405774.html
_5_PostgreSQL_

PostgreSQL 11 新特性解读: 分区表增加哈希分区

创建分区表语法

创建数据生成函数

创建哈希分区父表

创建子表

查看分区表定义

插入测试数据

查看分区表数据

根据分区键查询

根据非分区键查询

总结

参考

新书推荐

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像