MonetDB vs PostgreSQL a lite case-阿里云开发者社区

本文对比一下MonetDB 和 PostgreSQL在某些SQL下的性能.

(MonetDB 使用列存储, 支持单SQL使用多核, 线程模式, 支持数据压缩, 还有较多的OLAP相关的插件, 更适合OLAP的应用场景)

注意, PostgreSQL注重的是高并发, 而MonetDB注重数据分析, 所以两者没有什么可比性, 这里拿MonetDB的强项来VS PostgreSQL的软肋, 主要是给一些在使用PostgreSQL做数据分析并且遇到瓶颈的朋友一些启发, 换个产品试试, 不要使用一个产品的软肋(这里指PostgreSQL在统计方面目前略差, 未来9.4+版本出来单SQL可以用到多核再来谈统计效率)

PostgreSQL支持的列存储插件cstore_fdw性能, 如果要玩的话可以参考 :

http://blog.163.com/digoal@126/blog/static/16387704020144141052312/

对于定长类型, 可以提升较高的性能.

测试环境

DELL R610

CPU 1.6G 8核

CentOS 6.5 x64

PostgreSQL 9.3.5 (主要参数 shared_buffer=1G, block_size=8KB, checkpoint_segment=32, wal_segsize=16MB, synchronous_commit=off, autovacuum=on)

MonetDB 11.17.21

块设备 OCZ RevoDrive3X2 240G

文件系统 ext4

内存 96GB

测试数据2.6亿

MonetDB 占用 1.9G. (含PK)

PostgreSQL 占用 22G. (含PK)

插入性能对比, PostgreSQL比MonetDB几乎差了9倍性能.

插入条数 / 时间(毫秒)	MonetDB	PostgreSQL
1024	6	11
2048	7	21
4096	10	42
8192	17	84
16384	29	166
32768	51	332
65536	96	661
131072	184	1361
262144	361	2760
524288	701	5550
1048576	1300	11221
2097152	2800	22581
4194304	5500	45018
8388608	11200	91126
16777216	22100	183895
33554432	45300	376365
67108864	97000	764031
134217728	178000	1533433
平均每秒插入条数	754000/s	87000/s

查询性能对比, MonetDB完胜, 完全不是一个数量级的差别.

当然PostgreSQL count(*)性能本来就不咋地, 更适合OLTP, 曾经用了大量的篇幅来写使用PostgreSQL实现准实时数据统计, 可参阅

1. http://blog.163.com/digoal@126/blog/static/163877040201331252945440/

2. http://blog.163.com/digoal@126/blog/static/16387704020133151402415/

3. http://blog.163.com/digoal@126/blog/static/16387704020133155179877/

4. http://blog.163.com/digoal@126/blog/static/16387704020133156636579/

5. http://blog.163.com/digoal@126/blog/static/16387704020133218305242/

6. http://blog.163.com/digoal@126/blog/static/16387704020133224161563/

7. http://blog.163.com/digoal@126/blog/static/16387704020133271134563/

8. http://blog.163.com/digoal@126/blog/static/16387704020134311144755/

9. http://blog.163.com/digoal@126/blog/static/1638770402013113044354661/

10. http://blog.163.com/digoal@126/blog/static/16387704020137111254816/

SQL / 时间(毫秒)	MonetDB	PostgreSQL
select count(*) from a;	1	46484
select count(*) from a where id=1;	1	1
select count(*) from a where id>1;	2	68813
select count(*) from a where id<1;	2	0.6
select count(*) from a where id<>1;	3500	68710
select count(*) from a where id is null;	752	0.5
select count(*) from a where id is not null;	710	64771
select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';	0.4	68006
select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';	0.8	91155
select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';	1.7	72517
select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';	2800	86810
select count(*) from a where info is null;	773	43593
select count(*) from a where info is not null;	748	65161
select count() from (select info from a group by info having count() >1) as t;	4700	144180

空间占用对比,

2.6亿数据(含索引), MonetDB 1.9G, PostgreSQL 22G.

当然这里和测试数据有关, 因为字符串大量重复了.

后期做一下宽表的测试. 对比GreenPlum.

MonetDB还有很多强大的功能等待挖掘, 大家一起来玩吧.

详细数据如下 :

MonetDB

插入性能

sql>create table a(id int auto_increment primary key, info varchar(32));
operation successful (10.112ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 1 (3.274ms)
sql>select * from a;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
+------+----------------------------------+
1 tuple (2.448ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 2 (3.023ms)
sql>insert into a (info) select now() from a;
2 affected row (6.033ms)
sql>insert into a (info) select now() from a;
4 affected row (3.798ms)
sql>insert into a (info) select now() from a;
8 affected row (2.893ms)
sql>insert into a (info) select now() from a;
16 affected row (2.700ms)
sql>insert into a (info) select now() from a;
32 affected row (5.064ms)
sql>insert into a (info) select now() from a;
64 affected row (4.054ms)
sql>insert into a (info) select now() from a;
128 affected row (4.138ms)
sql>insert into a (info) select now() from a;
256 affected row (4.304ms)
sql>insert into a (info) select now() from a;
512 affected row (2.563ms)
sql>insert into a (info) select now() from a;
1024 affected row (6.094ms)
sql>insert into a (info) select now() from a;
2048 affected row (6.956ms)
sql>insert into a (info) select now() from a;
4096 affected row (9.618ms)
sql>insert into a (info) select now() from a;
8192 affected row (17.475ms)
sql>insert into a (info) select now() from a;
16384 affected row (29.347ms)
sql>insert into a (info) select now() from a;
32768 affected row (50.941ms)
sql>insert into a (info) select now() from a;
65536 affected row (95.736ms)
sql>insert into a (info) select now() from a;
131072 affected row (183.726ms)
sql>insert into a (info) select now() from a;
262144 affected row (361.436ms)
sql>insert into a (info) select now() from a;
524288 affected row (701.202ms)
sql>insert into a (info) select now() from a;
1048576 affected row (1.3s)
sql>insert into a (info) select now() from a;
2097152 affected row (2.8s)
sql>insert into a (info) select now() from a;
4194304 affected row (5.5s)
sql>insert into a (info) select now() from a;
8388608 affected row (11.2s)
sql>insert into a (info) select now() from a;
16777216 affected row (22.1s)
sql>insert into a (info) select now() from a;
33554432 affected row (45.3s)
sql>insert into a (info) select now() from a;
67108864 affected row (1m 27s)
sql>insert into a (info) select now() from a;
134217728 affected row (2m 58s)

空间占用大约1.9G.

[root@150 ~]# cd /data02/mdb1/
[root@150 mdb1]# ll
total 16
-rw------- 1 root root 7196 Aug 13 21:08 merovingian.log
-rw-r--r-- 1 root root    6 Aug 13 20:08 merovingian.pid
drwx------ 5 root root 4096 Aug 13 20:09 test
[root@150 mdb1]# du -sh *
8.0K    merovingian.log
4.0K    merovingian.pid
1.9G    test
[root@150 mdb1]# cd test/
[root@150 test]# ll
total 12
-rw-------  1 root root    0 Aug 13 20:09 58ec944a-5cff-dc51-4873-986923c64567
drwx------ 14 root root 4096 Aug 13 20:33 bat
drwx------  2 root root 4096 Aug 13 20:09 box
drwx------  3 root root 4096 Aug 13 20:09 sql_logs
[root@150 test]# du -sh *
0       58ec944a-5cff-dc51-4873-986923c64567
1.9G    bat
4.0K    box
12K     sql_logs
[root@150 test]# cd bat
[root@150 bat]# du -sh *
492K    01
288K    02
92K     04
1.1G    05
513M    07
572K    10
4.0K    11
4.0K    12
65M     13
323M    14
772K    15
4.0K    3.head
4.0K    3.tail
12K     3.theap
4.0K    4.head
4.0K    4.tail
12K     4.theap
28K     BACKUP

[root@150 bat]# ll
total 88
drwx------ 2 root root 4096 Aug 13 21:27 01
drwx------ 2 root root 4096 Aug 13 20:21 02
drwx------ 2 root root 4096 Aug 13 21:15 04
drwx------ 2 root root 4096 Aug 13 21:12 05
drwx------ 2 root root 4096 Aug 13 21:27 07
drwx------ 2 root root 4096 Aug 13 21:27 10
drwx------ 2 root root 4096 Aug 13 21:15 11
drwx------ 2 root root 4096 Aug 13 21:27 12
drwx------ 2 root root 4096 Aug 13 21:27 13
drwx------ 2 root root 4096 Aug 13 21:27 14
drwx------ 2 root root 4096 Aug 13 21:27 15
-rw------- 1 root root    8 Aug 13 20:09 3.head
-rw------- 1 root root    1 Aug 13 20:09 3.tail
-rw------- 1 root root 8216 Aug 13 20:09 3.theap
-rw------- 1 root root    8 Aug 13 20:09 4.head
-rw------- 1 root root    1 Aug 13 20:09 4.tail
-rw------- 1 root root 8343 Aug 13 20:09 4.theap
drwx------ 2 root root 4096 Aug 13 21:27 BACKUP

通过函数storage可以查看详细大小. 这里包含了count和columnsize. info字段占用500mb左右, id字段占用1G左右.

sql>select * from storage();
+-------+-------------------+----------------+-----------+----------+-----------+-------+------------+----------+---------+--------+
| schem | table             | column         | type      | location | count     | typew | columnsize | heapsize | indices | sorted |
: a     :                   :                :           :          :           : idth  :            :          :         :        :
+=======+===================+================+===========+==========+===========+=======+============+==========+=========+========+
| sys   | a                 | id             | int       | 05/573   | 268435456 |     4 | 1073741824 |        0 |       0 | true   |
| sys   | a                 | info           | varchar   | 07/734   | 268435456 |    32 |  536870912 |    16384 |       0 | true   |
| sys   | a                 | a_id_pkey      | oid       | 05/567   |         0 |     8 |          0 |        0 |       0 | true   |

查询测试

sql>select count(*) from a;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (0.979ms)
sql>select count(*) from a where id=1;
+------+
| L1   |
+======+
|    1 |
+------+
1 tuple (0.920ms)
sql>select count(*) from a where id>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (2.347ms)
sql>select count(*) from a where id<1;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (2.270ms)
sql>select count(*) from a where id<>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (3.5s)
sql>select count(*) from a where id is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (751.665ms)
sql>select count(*) from a where id is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (710.477ms)
sql>select * from a limit 10;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
|    2 | 2014-08-13 20:18:59.000000+08:00 |
|    3 | 2014-08-13 20:20:06.000000+08:00 |
|    4 | 2014-08-13 20:20:06.000000+08:00 |
|    5 | 2014-08-13 20:20:14.000000+08:00 |
|    6 | 2014-08-13 20:20:14.000000+08:00 |
|    7 | 2014-08-13 20:20:14.000000+08:00 |
|    8 | 2014-08-13 20:20:14.000000+08:00 |
|    9 | 2014-08-13 20:20:14.000000+08:00 |
|   10 | 2014-08-13 20:20:14.000000+08:00 |
+------+----------------------------------+
10 tuples (1.581ms)
sql>select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|   12 |
+------+
1 tuple (0.442ms)
sql>select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435440 |
+-----------+
1 tuple (0.844ms)
sql>select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|    4 |
+------+
1 tuple (1.734ms)
sql>select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435444 |
+-----------+
1 tuple (2.8s)
sql>select count(*) from a where info is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (772.987ms)
sql>select count(*) from a where info is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (747.712ms)
sql>select count(*) from (select info from a group by info having count(*) >1) as t;
+------+
| L2   |
+======+
|   18 |
+------+
1 tuple (4.7s)

PostgreSQL

插入测试

postgres=# create table a(id serial primary key, info text);
CREATE TABLE
postgres=# insert into a(info) values (now());
INSERT 0 1
postgres=# insert into a(info) select now() from a;
INSERT 0 1
Time: 0.954 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2
Time: 0.606 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4
Time: 0.352 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8
Time: 0.393 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16
Time: 0.476 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32
Time: 0.840 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 64
Time: 1.166 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 128
Time: 1.632 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 256
Time: 3.214 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 512
Time: 5.519 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1024
Time: 10.924 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2048
Time: 21.212 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4096
Time: 41.930 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8192
Time: 84.242 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16384
Time: 165.705 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32768
Time: 332.269 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 65536
Time: 661.579 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 131072
Time: 1361.281 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 262144
Time: 2760.690 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 524288
Time: 5550.096 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1048576
Time: 11221.544 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2097152
Time: 22581.858 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4194304
Time: 45018.124 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8388608
Time: 91126.619 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16777216
Time: 183895.102 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 33554432
Time: 376365.578 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 67108864
Time: 764031.565 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 134217728
Time: 1533433.507 ms

postgres=# select pg_total_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 22GB
(1 row)
Time: 1.027 ms
postgres=# select pg_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 17GB
(1 row)
Time: 0.702 ms

查询测试

postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 60818.321 ms
postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 46484.165 ms
postgres=# select count(*) from a where id=1;
 count 
-------
     1
(1 row)

Time: 0.996 ms
postgres=# select count(*) from a where id>1;
   count   
-----------
 268435455
(1 row)

Time: 68812.883 ms
postgres=# select count(*) from a where id<1;
 count 
-------
     0
(1 row)

Time: 0.647 ms
postgres=# select count(*) from a where id<>1;
   count   
-----------
 268435455
(1 row)

Time: 68710.001 ms
postgres=# select count(*) from a where id is null;
 count 
-------
     0
(1 row)

Time: 0.537 ms
postgres=# select count(*) from a where id is not null;
   count   
-----------
 268435456
(1 row)

Time: 64771.858 ms
postgres=# select * from a limit 10;
 id |             info              
----+-------------------------------
  1 | 2014-08-13 20:37:52.165199+08
  2 | 2014-08-13 20:38:06.945296+08
  3 | 2014-08-13 20:38:07.965092+08
  4 | 2014-08-13 20:38:07.965092+08
  5 | 2014-08-13 20:38:08.502181+08
  6 | 2014-08-13 20:38:08.502181+08
  7 | 2014-08-13 20:38:08.502181+08
  8 | 2014-08-13 20:38:08.502181+08
  9 | 2014-08-13 20:38:08.966175+08
 10 | 2014-08-13 20:38:08.966175+08
(10 rows)

Time: 0.629 ms
postgres=# select count(*) from a where info='2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 68005.864 ms
postgres=# select count(*) from a where info>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435440
(1 row)

Time: 91154.757 ms
postgres=# select count(*) from a where info<'2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 72516.503 ms
postgres=# select count(*) from a where info<>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435448
(1 row)

Time: 86810.115 ms
postgres=# select count(*) from a where info is null;
 count 
-------
     0
(1 row)

Time: 43593.035 ms
postgres=# select count(*) from a where info is not null;
   count   
-----------
 268435456
(1 row)

Time: 65161.258 ms
postgres=# select count(*) from (select info from a group by info having count(*) >1) as t;
 count 
-------
    27
(1 row)
Time: 144180.678 ms

[参考]

1. http://blog.163.com/digoal@126/blog/static/16387704020147135859247/

MonetDB vs PostgreSQL a lite case

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像