MonetDB vs PostgreSQL a lite case

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:
本文对比一下MonetDB 和 PostgreSQL在某些SQL下的性能. 
(MonetDB 使用列存储, 支持单SQL使用多核, 线程模式, 支持数据压缩, 还有较多的OLAP相关的插件, 更适合OLAP的应用场景)

注意, PostgreSQL注重的是高并发, 而MonetDB注重数据分析, 所以两者没有什么可比性, 这里拿MonetDB的强项来VS PostgreSQL的软肋, 主要是给一些在使用PostgreSQL做数据分析并且遇到瓶颈的朋友一些启发, 换个产品试试, 不要使用一个产品的软肋(这里指PostgreSQL在统计方面目前略差, 未来9.4+版本出来单SQL可以用到多核再来谈统计效率)
PostgreSQL支持的列存储插件cstore_fdw性能, 如果要玩的话可以 参考 : 
对于定长类型, 可以提升较高的性能.

测试环境
DELL R610
CPU 1.6G 8核
CentOS 6.5 x64
PostgreSQL 9.3.5 (主要参数 shared_buffer=1G, block_size=8KB, checkpoint_segment=32, wal_segsize=16MB, synchronous_commit=off, autovacuum=on)
MonetDB 11.17.21
块设备 OCZ RevoDrive3X2 240G
文件系统 ext4
内存 96GB
测试数据2.6亿
MonetDB 占用 1.9G. (含PK)
PostgreSQL 占用 22G.  (含PK)

插入性能对比, PostgreSQL比MonetDB几乎差了9倍性能.
 插入条数 
/ 时间(毫秒)
 MonetDB  PostgreSQL
 1024  6  11
 2048  7  21
 4096  10  42
 8192  17  84
 16384  29  166
 32768  51  332
 65536  96  661
 131072  184  1361
 262144  361  2760
 524288  701  5550
 1048576  1300  11221
 2097152  2800  22581
 4194304  5500  45018
 8388608  11200  91126
 16777216  22100  183895
 33554432  45300  376365
 67108864  97000  764031
 134217728  178000  1533433
 平均每秒插入条数  754000/s  87000/s
   
查询性能对比, MonetDB完胜, 完全不是一个数量级的差别. 
当然PostgreSQL count(*)性能本来就不咋地, 更适合OLTP, 曾经用了大量的篇幅来写使用PostgreSQL实现准实时数据统计, 可参阅
 SQL
/
时间(毫秒)
 MonetDB  PostgreSQL
 select count(*) from a;  1  46484
 select count(*) from a where id=1;  1  1
 select count(*) from a where id>1;  2  68813
 select count(*) from a where id<1;  2  0.6
 select count(*) from a where id<>1;  3500  68710
 select count(*) from a where id is null;  752  0.5
 select count(*) from a where id is not null;  710  64771
 select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';  0.4  68006
 select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';  0.8  91155
 select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';  1.7  72517
 select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';  2800  86810
 select count(*) from a where info is null;  773  43593
 select count(*) from a where info is not null;  748  65161
 select count(*) from (select info from a group by info having count(*) >1) as t;  4700  144180

空间占用对比,
2.6亿数据(含索引), MonetDB 1.9G, PostgreSQL 22G.
当然这里和测试数据有关, 因为字符串大量重复了.
后期做一下宽 表的测试. 对比GreenPlum.
MonetDB还有很多强大的功能等待挖掘, 大家一起来玩吧.

详细数据如下 : 
MonetDB
插入性能
sql>create table a(id int auto_increment primary key, info varchar(32));
operation successful (10.112ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 1 (3.274ms)
sql>select * from a;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
+------+----------------------------------+
1 tuple (2.448ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 2 (3.023ms)
sql>insert into a (info) select now() from a;
2 affected row (6.033ms)
sql>insert into a (info) select now() from a;
4 affected row (3.798ms)
sql>insert into a (info) select now() from a;
8 affected row (2.893ms)
sql>insert into a (info) select now() from a;
16 affected row (2.700ms)
sql>insert into a (info) select now() from a;
32 affected row (5.064ms)
sql>insert into a (info) select now() from a;
64 affected row (4.054ms)
sql>insert into a (info) select now() from a;
128 affected row (4.138ms)
sql>insert into a (info) select now() from a;
256 affected row (4.304ms)
sql>insert into a (info) select now() from a;
512 affected row (2.563ms)
sql>insert into a (info) select now() from a;
1024 affected row (6.094ms)
sql>insert into a (info) select now() from a;
2048 affected row (6.956ms)
sql>insert into a (info) select now() from a;
4096 affected row (9.618ms)
sql>insert into a (info) select now() from a;
8192 affected row (17.475ms)
sql>insert into a (info) select now() from a;
16384 affected row (29.347ms)
sql>insert into a (info) select now() from a;
32768 affected row (50.941ms)
sql>insert into a (info) select now() from a;
65536 affected row (95.736ms)
sql>insert into a (info) select now() from a;
131072 affected row (183.726ms)
sql>insert into a (info) select now() from a;
262144 affected row (361.436ms)
sql>insert into a (info) select now() from a;
524288 affected row (701.202ms)
sql>insert into a (info) select now() from a;
1048576 affected row (1.3s)
sql>insert into a (info) select now() from a;
2097152 affected row (2.8s)
sql>insert into a (info) select now() from a;
4194304 affected row (5.5s)
sql>insert into a (info) select now() from a;
8388608 affected row (11.2s)
sql>insert into a (info) select now() from a;
16777216 affected row (22.1s)
sql>insert into a (info) select now() from a;
33554432 affected row (45.3s)
sql>insert into a (info) select now() from a;
67108864 affected row (1m 27s)
sql>insert into a (info) select now() from a;
134217728 affected row (2m 58s)


空间占用大约1.9G.
[root@150 ~]# cd /data02/mdb1/
[root@150 mdb1]# ll
total 16
-rw------- 1 root root 7196 Aug 13 21:08 merovingian.log
-rw-r--r-- 1 root root    6 Aug 13 20:08 merovingian.pid
drwx------ 5 root root 4096 Aug 13 20:09 test
[root@150 mdb1]# du -sh *
8.0K    merovingian.log
4.0K    merovingian.pid
1.9G    test
[root@150 mdb1]# cd test/
[root@150 test]# ll
total 12
-rw-------  1 root root    0 Aug 13 20:09 58ec944a-5cff-dc51-4873-986923c64567
drwx------ 14 root root 4096 Aug 13 20:33 bat
drwx------  2 root root 4096 Aug 13 20:09 box
drwx------  3 root root 4096 Aug 13 20:09 sql_logs
[root@150 test]# du -sh *
0       58ec944a-5cff-dc51-4873-986923c64567
1.9G    bat
4.0K    box
12K     sql_logs
[root@150 test]# cd bat
[root@150 bat]# du -sh *
492K    01
288K    02
92K     04
1.1G    05
513M    07
572K    10
4.0K    11
4.0K    12
65M     13
323M    14
772K    15
4.0K    3.head
4.0K    3.tail
12K     3.theap
4.0K    4.head
4.0K    4.tail
12K     4.theap
28K     BACKUP

[root@150 bat]# ll
total 88
drwx------ 2 root root 4096 Aug 13 21:27 01
drwx------ 2 root root 4096 Aug 13 20:21 02
drwx------ 2 root root 4096 Aug 13 21:15 04
drwx------ 2 root root 4096 Aug 13 21:12 05
drwx------ 2 root root 4096 Aug 13 21:27 07
drwx------ 2 root root 4096 Aug 13 21:27 10
drwx------ 2 root root 4096 Aug 13 21:15 11
drwx------ 2 root root 4096 Aug 13 21:27 12
drwx------ 2 root root 4096 Aug 13 21:27 13
drwx------ 2 root root 4096 Aug 13 21:27 14
drwx------ 2 root root 4096 Aug 13 21:27 15
-rw------- 1 root root    8 Aug 13 20:09 3.head
-rw------- 1 root root    1 Aug 13 20:09 3.tail
-rw------- 1 root root 8216 Aug 13 20:09 3.theap
-rw------- 1 root root    8 Aug 13 20:09 4.head
-rw------- 1 root root    1 Aug 13 20:09 4.tail
-rw------- 1 root root 8343 Aug 13 20:09 4.theap
drwx------ 2 root root 4096 Aug 13 21:27 BACKUP

通过函数storage可以查看详细大小. 这里包含了count和columnsize. info字段占用500mb左右, id字段占用1G左右.
sql>select * from storage();
+-------+-------------------+----------------+-----------+----------+-----------+-------+------------+----------+---------+--------+
| schem | table             | column         | type      | location | count     | typew | columnsize | heapsize | indices | sorted |
: a     :                   :                :           :          :           : idth  :            :          :         :        :
+=======+===================+================+===========+==========+===========+=======+============+==========+=========+========+
| sys   | a                 | id             | int       | 05/573   | 268435456 |     4 | 1073741824 |        0 |       0 | true   |
| sys   | a                 | info           | varchar   | 07/734   | 268435456 |    32 |  536870912 |    16384 |       0 | true   |
| sys   | a                 | a_id_pkey      | oid       | 05/567   |         0 |     8 |          0 |        0 |       0 | true   |


查询测试
sql>select count(*) from a;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (0.979ms)
sql>select count(*) from a where id=1;
+------+
| L1   |
+======+
|    1 |
+------+
1 tuple (0.920ms)
sql>select count(*) from a where id>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (2.347ms)
sql>select count(*) from a where id<1;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (2.270ms)
sql>select count(*) from a where id<>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (3.5s)
sql>select count(*) from a where id is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (751.665ms)
sql>select count(*) from a where id is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (710.477ms)
sql>select * from a limit 10;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
|    2 | 2014-08-13 20:18:59.000000+08:00 |
|    3 | 2014-08-13 20:20:06.000000+08:00 |
|    4 | 2014-08-13 20:20:06.000000+08:00 |
|    5 | 2014-08-13 20:20:14.000000+08:00 |
|    6 | 2014-08-13 20:20:14.000000+08:00 |
|    7 | 2014-08-13 20:20:14.000000+08:00 |
|    8 | 2014-08-13 20:20:14.000000+08:00 |
|    9 | 2014-08-13 20:20:14.000000+08:00 |
|   10 | 2014-08-13 20:20:14.000000+08:00 |
+------+----------------------------------+
10 tuples (1.581ms)
sql>select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|   12 |
+------+
1 tuple (0.442ms)
sql>select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435440 |
+-----------+
1 tuple (0.844ms)
sql>select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|    4 |
+------+
1 tuple (1.734ms)
sql>select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435444 |
+-----------+
1 tuple (2.8s)
sql>select count(*) from a where info is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (772.987ms)
sql>select count(*) from a where info is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (747.712ms)
sql>select count(*) from (select info from a group by info having count(*) >1) as t;
+------+
| L2   |
+======+
|   18 |
+------+
1 tuple (4.7s)


PostgreSQL
插入测试
postgres=# create table a(id serial primary key, info text);
CREATE TABLE
postgres=# insert into a(info) values (now());
INSERT 0 1
postgres=# insert into a(info) select now() from a;
INSERT 0 1
Time: 0.954 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2
Time: 0.606 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4
Time: 0.352 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8
Time: 0.393 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16
Time: 0.476 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32
Time: 0.840 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 64
Time: 1.166 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 128
Time: 1.632 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 256
Time: 3.214 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 512
Time: 5.519 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1024
Time: 10.924 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2048
Time: 21.212 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4096
Time: 41.930 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8192
Time: 84.242 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16384
Time: 165.705 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32768
Time: 332.269 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 65536
Time: 661.579 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 131072
Time: 1361.281 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 262144
Time: 2760.690 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 524288
Time: 5550.096 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1048576
Time: 11221.544 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2097152
Time: 22581.858 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4194304
Time: 45018.124 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8388608
Time: 91126.619 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16777216
Time: 183895.102 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 33554432
Time: 376365.578 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 67108864
Time: 764031.565 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 134217728
Time: 1533433.507 ms

postgres=# select pg_total_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 22GB
(1 row)
Time: 1.027 ms
postgres=# select pg_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 17GB
(1 row)
Time: 0.702 ms


查询测试
postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 60818.321 ms
postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 46484.165 ms
postgres=# select count(*) from a where id=1;
 count 
-------
     1
(1 row)

Time: 0.996 ms
postgres=# select count(*) from a where id>1;
   count   
-----------
 268435455
(1 row)

Time: 68812.883 ms
postgres=# select count(*) from a where id<1;
 count 
-------
     0
(1 row)

Time: 0.647 ms
postgres=# select count(*) from a where id<>1;
   count   
-----------
 268435455
(1 row)

Time: 68710.001 ms
postgres=# select count(*) from a where id is null;
 count 
-------
     0
(1 row)

Time: 0.537 ms
postgres=# select count(*) from a where id is not null;
   count   
-----------
 268435456
(1 row)

Time: 64771.858 ms
postgres=# select * from a limit 10;
 id |             info              
----+-------------------------------
  1 | 2014-08-13 20:37:52.165199+08
  2 | 2014-08-13 20:38:06.945296+08
  3 | 2014-08-13 20:38:07.965092+08
  4 | 2014-08-13 20:38:07.965092+08
  5 | 2014-08-13 20:38:08.502181+08
  6 | 2014-08-13 20:38:08.502181+08
  7 | 2014-08-13 20:38:08.502181+08
  8 | 2014-08-13 20:38:08.502181+08
  9 | 2014-08-13 20:38:08.966175+08
 10 | 2014-08-13 20:38:08.966175+08
(10 rows)

Time: 0.629 ms
postgres=# select count(*) from a where info='2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 68005.864 ms
postgres=# select count(*) from a where info>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435440
(1 row)

Time: 91154.757 ms
postgres=# select count(*) from a where info<'2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 72516.503 ms
postgres=# select count(*) from a where info<>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435448
(1 row)

Time: 86810.115 ms
postgres=# select count(*) from a where info is null;
 count 
-------
     0
(1 row)

Time: 43593.035 ms
postgres=# select count(*) from a where info is not null;
   count   
-----------
 268435456
(1 row)

Time: 65161.258 ms
postgres=# select count(*) from (select info from a group by info having count(*) >1) as t;
 count 
-------
    27
(1 row)
Time: 144180.678 ms


[参考]
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
存储 关系型数据库 PostgreSQL
PostgreSQL 优化CASE - 有序UUID插件
标签 PostgreSQL , uuid , 无序uuid , 索引分裂 , io , 性能诊断 背景 无序UUID会带来很多问题,例如索引分裂膨胀,离散IO,WAL膨胀等,详见以前的分析。 Regular random UUIDs are distributed uniformly over the whole range of possible values.
2268 0
|
SQL 关系型数据库 数据库
PostgreSQL 设计优化case - 大宽表任意字段组合查询索引如何选择(btree, gin, rum) - (含单个索引列数超过32列的方法)
标签 PostgreSQL , adhoc查询 , 大宽表 , 任意字段组合查询 , 索引 , btree , gin , rum 背景 大宽表,任意字段组合查询,透视。是实时分析系统中的常见需求: 1、实时写入。
2485 0
|
SQL 存储 关系型数据库
PostgreSQL 设计优化case - 多对多 转 一对多(数组)
标签 PostgreSQL , 数组 , 多对多 , 一对多 , udf , JOIN 背景 某个系统存储了会员的标签,以及标签的描述信息。业务上需要通过会员ID得到会员的标签,再得到描述信息。 每个会员有若干标签,原来是这么存储的 1、会员标签表,人数5亿左右,每个人平均有几百个标签,1500亿行左右。
2121 0
|
SQL 算法 关系型数据库
PostgreSQL 10.1 手册_部分 II. SQL 语言_第 10 章 类型转换_10.5. UNION、CASE和相关结构
10.5. UNION、CASE和相关结构 SQL UNION结构必须使可能不相似的类型匹配成为一个单一的结果集。该决定算法被独立地应用到一个联合查询的每个输出列。 INTERSECT和EXCEPT采用和UNION相同的方法来决定不相似的类型。
1275 0