宝剑赠英雄 - 任意字段\条件等效查询, 探探PostgreSQL多列展开式B树

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介: 标签 PostgreSQL , 多列索引 , btree , gin , gist , brin , btree_gist , btree_gin , 复合索引 , composite index , 任意字段等效查询 背景 多列的组合查询在实际的应用中也较为常见,比如淘宝购物网站的商品搜索 有发货地、分类、是否包邮、是否货到付款、是否天猫、二手、等等许多选项,这些选项在设计时可能使用多个字段来表示(当然,有些可能会使用BIT合并成单个字段来表述)。

标签

PostgreSQL , 多列索引 , btree , gin , gist , brin , btree_gist , btree_gin , 复合索引 , composite index , 任意字段等效查询


背景

很多人小时候都有一个武侠梦,独孤求败更是金庸武侠小说里的一位传奇人物。

纵横江湖三十馀载,杀尽仇寇奸人,败尽英雄豪杰,天下更无抗手,无可奈何,惟隐居深谷,以雕为友。 呜呼,生平求一敌手而不可得,诚寂寥难堪也。

独孤老前辈的佩剑描写非常有意思,从使用的佩剑,可以看出一个人的武功修为。

第一柄是一柄青光闪闪的无名利剑。「凌厉刚猛,无坚不摧,弱冠前以之与河朔群雄争锋。」

第二柄是紫薇软剑,「三十岁前所用,误伤义士不祥,乃弃之深谷。」

第三柄是玄铁重剑,「重剑无锋,大巧不工,四十岁之前恃之横行天下。」

第四柄是柄已腐朽的木剑,原因是独孤求败「四十岁后,不滞于物,草木竹石均可为剑。」

个人感觉和我们现在搞IT的也很相似哦,开发语言有C, python, java, ....各式各样的可选,数据库有Oracle, PostgreSQL, MySQL等等,也是各式各样的产品可选。

宝剑赠英雄,美玉配佳人。

不管你是李逍遥、还是杨过、小龙女,希望你也能找到合适你的武器。

pic

pic

进入正题。

多列的组合查询在实际的应用中也较为常见,比如淘宝购物网站的商品搜索

pic

有发货地、分类、是否包邮、是否货到付款、是否天猫、二手、等等许多选项,这些选项在设计时可能使用多个字段来表示(当然,有些可能会使用BIT合并成单个字段来表述)。

举个非常简单的例子

CREATE TABLE test2 (    
  major int,    
  minor int,    
  name varchar    
);    

查询条件中,包含test2表的两个字段的检索

SELECT name FROM test2 WHERE major = constant AND minor = constant;    

这种情况下,我们可以使用两个索引,也可以使用一个复合索引(多列索引)。

CREATE INDEX test2_mm_idx ON test2 (major, minor);    

以上例子可以转化为对多个字段的任意组合查询需求(任意单一、任意两个、任意三个,任意若干个字段的查询需求)。

看过我写的文档的童鞋,可能会联想到我在之前写过一篇文档

《PostgreSQL 9.6 黑科技 bloom 算法索引,一个索引支撑任意列组合查询》

不过本文并不是要讲bloom,本文要讲一讲另一种技术(gin索引的暗藏功能,多列展开式B树)。

在开始正文之前,大家有没有想过这些问题呢?

1. 哪些索引方法支持多列索引?

PostgreSQL目前支持btree, hash, gin, gist, sp-gist, brin, bloom, rum等众多索引访问方法,哪些访问方法支持多列索引呢?

2. 多列索引的内部存储结构如何?

比如b-tree单列索引很好理解,就是以被索引列值为KEY的类B-Tree(nbtree)结构。但是当使用多列索引时,内部是如何组织的呢?

不同的索引方法,内部组织有什么差异呢?

3. 多列索引支持哪些查询组合

比如index on (a,b,c)三列,那么哪些查询条件能用上多列索引呢?比如where a=? and b>?

不同的索引方法,适用的查询条件是不是都一样呢?

4. 不同的查询组合,使用多列索引的效率如何,效率是否一样(是否与索引访问方法有关?)

比如b-tree index on (a,b,c)三列,where a=? and b>? 以及 where b>? and c=? 效率一样吗?

5. 多列索引,每个列的顺序是否可以指定

比如,是不是所有的索引方法都可创建这样的索引 index on (a,b desc, c desc nulls first)

6. 同样的查询组合,使用什么索引方法更高效

比如 where a=? and b=? and c=? 这样的查询,适用gin好还是b-tree好呢?

7. 如何选择合适的索引方法,与查询条件,数据分布有关系吗?

要回答这些问题,需要对索引方法,内部存储结构有一定的了解。

本文将以gin和btree为例,讲解一下multi column index,它们的内部存储结构,适应的场景。

一 单列索引的内部结构

建议读者先了解一下单列索引的内部结构,本文就不展开了,可以参考我之前写的一些文章。

《深入浅出PostgreSQL B-Tree索引结构》

《B-Tree和B+Tree》

《PostgreSQL GIN索引实现原理》

《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》

二 PostgreSQL use bitmap combine single column indexs

在没有multi column index时,如果我们有多个列的查询条件,通常可以使用选择性好的列,或者多个列索引的组合。

PostgreSQL 使用多个列索引组合查询时,可以使用多列查询结果的ctid bitmap and or ,筛选出最终符合多列条件的ctid。

不仅适用于多列的查询条件,也适用于单列的多个查询条件。

例如

WHERE x = 42 OR x = 47 OR x = 53 OR x = 99    

WHERE x = 5 AND y = 6    

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

Combining Multiple Indexes

A single index scan can only use query clauses that use the index's columns with operators of its operator class and are joined with AND.     
For example, given an index on (a, b) a query condition like WHERE a = 5 AND b = 6 could use the index, but a query like WHERE a = 5 OR b = 6 could not directly use the index.    

Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans.     
The system can form AND and OR conditions across several index scans.     
For example, a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses.     
The results of these scans are then ORed together to produce the result.     
Another example is that if we have separate indexes on x and y,     
one possible implementation of a query like WHERE x = 5 AND y = 6 is to use each index with the appropriate query clause and then AND together the index results to identify the result rows.    

支持单列组合条件,也支持多列组合条件。    

To combine multiple indexes, the system scans each needed index and prepares a bitmap in memory giving the locations of table rows that are reported as matching that index's conditions.     
The bitmaps are then ANDed and ORed together as needed by the query. Finally, the actual table rows are visited and returned.     

注意bitmap扫描是按CTID顺序输出,而非KEY的顺序输出,所以如果对列有ORDER BY的需求,那么会需要额外的sort,优化器会根据实际情况选择合适的执行计划(bitmap 或 单个索引filter但是不用sort)    

The table rows are visited in physical order, because that is how the bitmap is laid out;     
this means that any ordering of the original indexes is lost, and so a separate sort step will be needed if the query has an ORDER BY clause.     
For this reason, and because each additional index scan adds extra time, the planner will sometimes choose to use a simple index scan even though additional indexes are available that could have been used as well.    

In all but the simplest applications, there are various combinations of indexes that might be useful, and the database developer must make trade-offs to decide which indexes to provide.     
Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the index-combination feature.     
For example, if your workload includes a mix of queries that sometimes involve only column x, sometimes only column y, and sometimes both columns,     
you might choose to create two separate indexes on x and y, relying on index combination to process the queries that use both columns.     
You could also create a multicolumn index on (x, y).     
This index would typically be more efficient than index combination for queries involving both columns,     
but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index.     
A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used,     
though it would be larger and hence slower than an index on x alone. The last alternative is to create all three indexes,     
but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common.     
If one of the types of query is much less common than the others, you'd probably settle for creating just the two indexes that best match the common types.    

在选择单列还是多列索引时,请根据查询需求选择。如果查询很多,更新插入删除很少,那么如果查询条件有a, b, a and b这类的,可以建立3个索引,分别是(a), (b), (a,b)。      

三 哪些索引方法支持multi column index

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

Up to 32 columns can be specified. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)

目前PostgreSQL的B-tree, GiST, GIN, and BRIN索引方法,支持多列索引。

目前支持最多32个列的多列索引,实际上可以更大(通过调整pg_config_manual.h可以做到更大,但是还有另一个限制,indextuple不能超过约1/4的数据块大小,也就是说复合索引列很多的情况下,可能会触发这个限制)。

四 multi column index的查询组合与效率解读

Multicolumn Indexes

https://www.postgresql.org/docs/current/static/indexes-multicolumn.html

由于b-tree, gin , gist, brin都支持multi column索引,但是这几种索引的内部存储方式不一样,所以不同的组合查询的效率也不一样。

例如a,b,c三列的组合索引,select * from tbl where a=? and b>? 以及 where b=?,这两种查询组合,哪个效率高?和索引方法有大大的关系。

btree 查询组合与效率

b-tree多列索引支持任意列的组合查询

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
虽然b-tree多列索引支持任意列的组合查询,但是最有效的查询还是包含驱动列条件的查询。

The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.
对于b-tree的多列索引来说,一个查询要扫描索引的哪些部分呢?

从驱动列开始算,按索引列的顺序算到非驱动列的第一个不相等条件为止(没有任何条件也算)。

Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5.

(WHERE a = 5 AND b >= 42 AND c < 77),从a=5, b=42开始的所有索引条目,都会被扫描。

Index entries with c >= 77 would be skipped, but they'd still have to be scanned through.

其他例子

(WHERE b >= 42 AND c < 77),所有索引条目,都会被扫描。只要不包含驱动列,则扫描所有索引条目。

(WHERE a = 5 AND c < 77),a=5的所有索引条目,都会被扫描。

(WHERE a >= 5 AND b=1 and c < 77),从a=5开始的所有索引条目,都会被扫描。

This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

建议有频繁的复合查询,并且复合查询带有驱动列以及其他列的查询时,可以考虑使用多列索引。

gist 查询组合与效率

gist多列索引支持任意列的组合查询。

A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns.

Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned.

注意与b-tree不一样的地方,驱动列的选择性决定了需要扫描多少索引条目,扫描多少条目与非驱动列无关(而b-tree是与非驱动列也有关的)。

A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.

如果驱动列的选择性不好、其他列的选择性很好,即使查询条件同时包含了 驱动列以及其他列 ,也需要扫描很多索引条目,因为扫描多少索引条目和其他列无关。

这么说,并不建议使用gist多列索引。

如果一定要使用GIST多列索引,请一定要把选择性好的列作为驱动列。

gin 查询组合与效率

gin多列索引支持任意列的组合查询。

并且任意查询条件的查询效率都是一样的。

A multicolumn GIN index can be used with query conditions that involve any subset of the index's columns.

Unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.

brin 查询组合与效率

brin多列索引支持任意列的组合查询。

并且任意查询条件的查询效率都是一样的。

如果有brin组合查询的必要(比如多个与ctid线性相关的列的范围查询,无所谓线性的方向),任何时候都建议使用BRIN的multi column index,除非想针对不同的列使用不同的pages_per_range(比如有些列10个块的范围和另外一些列100个块的范围覆盖差不多,那么建议它们使用不同的pages_per_range)

A multicolumn BRIN index can be used with query conditions that involve any subset of the index's columns.

Like GIN and unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.

The only reason to have multiple BRIN indexes instead of one multicolumn BRIN index on a single table is to have a different pages_per_range storage parameter.

五 multi column使用建议

多列索引每个列的operator class必须和实际查询匹配,在创建索引时可以指定。

Of course, each column must be used with operators appropriate to the index type; clauses that involve other operators will not be considered.

Multicolumn indexes should be used sparingly.

In most situations, an index on a single column is sufficient and saves space and time.

Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.

六 多列索引,在不同组合下的查询效率差异,原理剖析 - 从索引内部结构说起

前面分析了b-tree, gin都支持任意组合查询。

但是b-tree推荐使用包含驱动列的查询条件,如果查询条件未包含驱动列,则需要扫描整个复合索引。

而gin则通吃,可以输入任意组合列作为查询条件,并且效率一致。

例如

index on (a,b,c)

b-tree 对于包含驱动列a查询条件的SQL,效率可能比较好,不包括a查询条件的SQL,即使走索引,也要扫描整个索引的所有条目。

而gin 则无论任何查询条件,效果都一样。

这是为什么呢?必须从索引的内部存储组织结构来分析。

b-tree multi column index 剖析

btree 对被索引列按创建索引时指定的顺序排序,然后建立B树。

如create index idx on tbl using btree (a,b desc nulls first,c desc, e);

所以B树中的KEY实际上就是被索引列的组合对象,这个结构决定了什么查询能用上这个复合索引。

(a,b,c), row?    
(a,b,c), row?    
(a,b,c), row?    
....    

要达到最高效的使用这种复合索引,必须带上驱动列的条件。

如果order by要用上索引,那么必须order by的写法要与创建索引时指定的顺序一致。

例如select * from tbl where a=? order by a,b desc nulls first;

gin multi column index 剖析

gin 的复合索引很有趣,它将所有列展开,然后将展开后的数据(列ID+值)排序并建立B树。

因此在gin的复合索引中,B树的KEY实际上是列ID+值。

(column_a, v1), row?    
(column_b, v1), row?    
(column_b, v2), row?    
(column_c, v2), row?    
....    

这样的树,以任意组合进行查询,效率都一样。

where a=? 与 where b=? 效率一样,而且和B-tree的单列索引的效率几乎一致(当索引层级一致时)。

where a=? and b=? 与 where b=? and c=? 效率一样(复合索引查两次,在内部使用bitmapAnd取得结果)。

仅仅当多列组合查询时,gin效率可能比不上b-tree的带驱动列的查询(因为b-tree索引不需要bitmapAnd,而gin需要内部bitmapAnd)。

七 例子

创建一个测试表,包含3个字段

postgres=# create table t3(c1 int, c2 text, c3 int);    
CREATE TABLE    

插入100万记录,其中c2,c3的值固定

postgres=# postgres=# insert into t3 select generate_series(1,1000000),'test',1;    
INSERT 0 1000000    

创建gin复合索引

postgres=# create index idx_t3_1 on t3 using gin(c1,c2,c3);    
CREATE INDEX    

查询c1=1,效率与单列索引一致

这个查询结果也可以说明另一个问题,不同列并不是单纯展开后直接构建B树,它依旧添加了列ID进来,所以即使c3=1有100万记录,并不影响c1=1的扫描PAGE数。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c1=1;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.01..6.02 rows=1 width=13) (actual time=0.021..0.021 rows=1 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c1 = 1)    
   Heap Blocks: exact=1    
   Buffers: shared hit=5    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.01 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=1)    
         Index Cond: (t3.c1 = 1)    
         Buffers: shared hit=4    
 Planning time: 0.076 ms    
 Execution time: 0.047 ms    
(10 rows)    

查询c2=?,效率与单列索引一致

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='test';    
                                                            QUERY PLAN                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=8121.00..26027.00 rows=1000000 width=13) (actual time=74.467..179.603 rows=1000000 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c2 = 'test'::text)    
   Heap Blocks: exact=5406    
   Buffers: shared hit=5542    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..7871.00 rows=1000000 width=0) (actual time=73.640..73.640 rows=1000000 loops=1)    
         Index Cond: (t3.c2 = 'test'::text)    
         Buffers: shared hit=136    
 Planning time: 0.130 ms    
 Execution time: 230.770 ms    
(10 rows)    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='t';    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.00..6.01 rows=1 width=13) (actual time=0.014..0.014 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c2 = 't'::text)    
   Buffers: shared hit=4    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.00 rows=1 width=0) (actual time=0.013..0.013 rows=0 loops=1)    
         Index Cond: (t3.c2 = 't'::text)    
         Buffers: shared hit=4    
 Planning time: 0.081 ms    
 Execution time: 0.039 ms    
(9 rows)    

查询c3=?,效率与单列索引一致


postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c3=1;    
                                                            QUERY PLAN                                                                
----------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=8121.00..26027.00 rows=1000000 width=13) (actual time=77.949..182.939 rows=1000000 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c3 = 1)    
   Heap Blocks: exact=5406    
   Buffers: shared hit=5542    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..7871.00 rows=1000000 width=0) (actual time=77.116..77.116 rows=1000000 loops=1)    
         Index Cond: (t3.c3 = 1)    
         Buffers: shared hit=136    
 Planning time: 0.083 ms    
 Execution time: 234.558 ms    
(10 rows)    

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c3=2;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=5.00..6.01 rows=1 width=13) (actual time=0.015..0.015 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: (t3.c3 = 2)    
   Buffers: shared hit=4    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..5.00 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1)    
         Index Cond: (t3.c3 = 2)    
         Buffers: shared hit=4    
 Planning time: 0.081 ms    
 Execution time: 0.040 ms    
(9 rows)    

gin任意组合(不需要限定驱动列)多列查询的隐含bitmapAnd, bitmapOr操作

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c1=2 and c3=1;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=9.01..10.03 rows=1 width=13) (actual time=0.044..0.044 rows=1 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: ((t3.c1 = 2) AND (t3.c3 = 1))    
   Heap Blocks: exact=1    
   Buffers: shared hit=10    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..9.01 rows=1 width=0) (actual time=0.040..0.040 rows=1 loops=1)    
         Index Cond: ((t3.c1 = 2) AND (t3.c3 = 1))    
         Buffers: shared hit=9    
 Planning time: 0.061 ms    
 Execution time: 0.063 ms    
(10 rows)    

没有驱动列,一样高效无比

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t3 where c2='test' and c3=2;    
                                                   QUERY PLAN                                                        
-----------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t3  (cost=9.00..10.02 rows=1 width=13) (actual time=0.052..0.052 rows=0 loops=1)    
   Output: c1, c2, c3    
   Recheck Cond: ((t3.c2 = 'test'::text) AND (t3.c3 = 2))    
   Buffers: shared hit=9    
   ->  Bitmap Index Scan on idx_t3_1  (cost=0.00..9.00 rows=1 width=0) (actual time=0.051..0.051 rows=0 loops=1)    
         Index Cond: ((t3.c2 = 'test'::text) AND (t3.c3 = 2))    
         Buffers: shared hit=9    
 Planning time: 0.086 ms    
 Execution time: 0.075 ms    
(9 rows)    

gin复合索引的展开式B树决定了不能按单列设置顺序

postgres=# create index idx_t1_0 on t1 using gin (c1, c2 desc);    
ERROR:  0A000: access method "gin" does not support ASC/DESC options    
LOCATION:  ComputeIndexAttrs, indexcmds.c:1248    

八 btree vs gin 多列索引

1. 由于btree index, 多列值根据创建索引的DDL指定顺序sort后,多列的值组合后作为一个KEY存储在B树中。

例如4条记录如下

1,1,2;     
1,100,2;     
2,1,10;     
1,1,3;     

btree 中的key排序后分布(有多少条记录,就有多少KEY)

1,1,2; 1,1,3; 1,100,2; 2,1,10;     

2. GIN MULTI COLUMN INDEX 构建了一个包含多种数据类型的B-TREE , 将多列的数据展开后,排序后分布 (key的数量为每列的count distinct总和)

column1,1; column2,1; column1,2; column3,2; column3,3; column3,10; column2,100;     

更形象的比喻

比如有三幅扑克牌(每幅54张牌),每一幅代表一列,如果要创建3列的复合索引,那么B-TREE会创建出54个条目的B树,而GIN会创建出包含162个条目的B树。

请看这个例子,可以说明这个情况

postgres=# create table t2(c1 int2, c2 int4, c3 int8, c4 numeric, c5 text, c6 timestamp);    
CREATE TABLE    
postgres=# insert into t2 select c1,c1,c1,c1,c5,c6 from (select trunc(random()*1000) c1, md5(random()::text) c5, now()+(random()*10000||' sec')::interval c6 from generate_series(1,100000)) t;    
INSERT 0 100000    
postgres=# create index idx_t2_1 on t2 using gin (c1,c2,c3,c4,c5,c6);    
CREATE INDEX    
postgres=# select count(distinct c4) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c1) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c2) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c3) from t2;    
 count     
-------    
  1000    
(1 row)    

postgres=# select count(distinct c5) from t2;    
 count     
-------    
 99996    
(1 row)    

postgres=# select count(distinct c6) from t2;    
 count      
--------    
 100000    
(1 row)    

postgres=# select 99996+100000+4000;    
 ?column?     
----------    
   203996    
(1 row)    

postgres=# select * from gin_metapage_info(get_raw_page('idx_t2_1',0));    
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version     
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------    
   4294967295 |   4294967295 |              0 |               0 |                0 |          2611 |          2610 |            0 |    203996 |       2    
(1 row)    

n_entries = count(distinct indexed column1) + ....     

b-tree 要高效的使用复合索引,必须带驱动列的查询条件。

GIN 使用复合索引,可以任意组合查询条件,当有多列查询条件时,使用隐含的bitmapAnd or bitmapOr。

什么情况下,gin比btree慢?

通常,多列查询时,如果使用了驱动列,那么B-TREE索引会更快一些,因为它不需要使用bitmapAnd or bitmapOr,而GIN需要。

其他情况,gin都比btree的复合查询快。

九 gin 多列索引的独门秘籍

通过前面的分析,我们已经摸清楚了GIN的复合索引结构(展开式B树),并且也知道GIN是key+ctid list的类b+tree树组织形式,它可以有很高的压缩比,也可以高效的查询单KEY。

那么GIN具备这些特性后,有哪些独门秘籍呢?

1. 任意组合查询,都可以达到很高效,你不需要创建多个b-tree索引了,一个GIN搞定它(不必担心GIN的IO,有fastupdate技术支撑,并且读写(entry合并)不堵塞)。

比如淘宝的搜索页面,用户可能根据任意属性,勾选任意条件进行查询。

创建一张测试表, 6个字段

postgres=# create table taobao(c1 int, c2 int, c3 timestamp, c4 text, c5 numeric, c6 text);  
CREATE TABLE  

插入1000万随机记录

postgres=# insert into taobao select random()*2000, random()*3000, now()+((50000-100000*random())||' sec')::interval , md5(random()::text), round((random()*1000000)::numeric,2), md5(random()::text) from generate_series(1,10000000);  
INSERT 0 10000000  

创建GIN多列索引

postgres=# create index idx_taobao_gin on taobao using gin(c1,c2,c3,c4,c5,c6);  

数据样例

postgres=# select * from taobao limit 10;  
  c1  |  c2  |             c3             |                c4                |    c5     |                c6                  
------+------+----------------------------+----------------------------------+-----------+----------------------------------  
 1405 |  882 | 2017-02-06 09:41:24.985878 | 49982683517aab7d194f3affe74ba827 |  65157.79 | 256ad6d098a6536e3548b5af91a26557  
 1277 | 1269 | 2017-02-06 09:52:16.313212 | b40c1febdb7f62c916d3632a03092261 | 940751.10 | 9911b203e38b57c55a769312c2aaaeba  
  870 |  159 | 2017-02-06 05:59:46.853421 | 96a0f84d9f9381d77364d93ca2d7aa6f | 419618.52 | 1e716d90055d3b32027a5e80a19e2f4f  
 1990 | 1100 | 2017-02-07 00:35:26.849744 | 684b66b25eb57d97f604eb9d92dfc8b0 | 764940.62 | eea82a253995a70da23a9f9ba3015175  
  625 | 1076 | 2017-02-06 01:48:13.929789 | 7fe094f548cffa367ebeb28d4f188875 | 482201.22 | 81ba27a8123dbd3e3a741c5984368709  
  968 |  554 | 2017-02-06 06:05:29.971936 | 6c8b34e4eb7c5eca3a8ad6131c5aecd3 | 583617.05 | b5cfc01c845cc87a4eb8a425ac9b2e01  
 1795 |  667 | 2017-02-06 20:18:46.027376 | b4ef2282064ef3e7a4eb3c624ba42334 | 911819.49 | a794b4635bb314972d891c7218063642  
  222 | 1041 | 2017-02-06 20:29:15.686187 | f3a6bc2b683e272c293d09353fb2465b | 722814.10 | 22b3b06a69b134dee9e3236907637683  
 1596 | 2153 | 2017-02-05 22:38:54.795333 | 1455f7f7f198d09e380e680321a31968 | 672431.00 | efac7b0a2154e25321a522b98903df41  
  313 | 2955 | 2017-02-06 22:59:49.117719 | 7840bbd2be443904b094b9f8919730c0 | 525295.10 | b5feb900191fd404f3a3de407fa62c93  
(10 rows)  

 public | taobao | table | postgres | 1202 MB |   
 public | idx_taobao_gin | index | postgres | taobao | 3761 MB |   

查询测试

任意单一字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c1=1;  
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=46.51..4873.01 rows=4840 width=90) (actual time=1.602..10.038 rows=5043 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c1 = 1)  
   Heap Blocks: exact=4958  
   Buffers: shared hit=4966  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..45.30 rows=4840 width=0) (actual time=0.849..0.849 rows=5043 loops=1)  
         Index Cond: (taobao.c1 = 1)  
         Buffers: shared hit=8  
 Planning time: 0.319 ms  
 Execution time: 10.346 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=882;  
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=34.16..3287.73 rows=3246 width=90) (actual time=1.175..7.024 rows=3373 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c2 = 882)  
   Heap Blocks: exact=3350  
   Buffers: shared hit=3358  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..33.35 rows=3246 width=0) (actual time=0.651..0.651 rows=3373 loops=1)  
         Index Cond: (taobao.c2 = 882)  
         Buffers: shared hit=8  
 Planning time: 0.094 ms  
 Execution time: 7.236 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c3='2017-02-06 09:41:24.985878';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.025..0.025 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.019..0.019 rows=1 loops=1)  
         Index Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
         Buffers: shared hit=5  
 Planning time: 0.125 ms  
 Execution time: 0.057 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.028..0.028 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)  
         Index Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
         Buffers: shared hit=5  
 Planning time: 0.101 ms  
 Execution time: 0.058 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c5='764940.62';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.028..0.028 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c5 = 764940.62)  
   Heap Blocks: exact=1  
   Buffers: shared hit=6  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)  
         Index Cond: (taobao.c5 = 764940.62)  
         Buffers: shared hit=5  
 Planning time: 0.127 ms  
 Execution time: 0.069 ms  
(10 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c6='test';  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=8.01..9.02 rows=1 width=90) (actual time=0.023..0.023 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: (taobao.c6 = 'test'::text)  
   Buffers: shared hit=5  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.020..0.020 rows=0 loops=1)  
         Index Cond: (taobao.c6 = 'test'::text)  
         Buffers: shared hit=5  
 Planning time: 0.088 ms  
 Execution time: 0.054 ms  
(9 rows)  

任意2字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=1 and c3='2017-02-06 09:41:24.985878';  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=15.00..16.02 rows=1 width=90) (actual time=0.060..0.060 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c2 = 1) AND (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
   Buffers: shared hit=11  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..15.00 rows=1 width=0) (actual time=0.057..0.057 rows=0 loops=1)  
         Index Cond: ((taobao.c2 = 1) AND (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
         Buffers: shared hit=11  
 Planning time: 0.100 ms  
 Execution time: 0.093 ms  
(9 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c2=1 or c3='2017-02-06 09:41:24.985878';  
                                                             QUERY PLAN                                                               
------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=42.98..3305.68 rows=3247 width=90) (actual time=1.165..6.947 rows=3330 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c2 = 1) OR (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone))  
   Heap Blocks: exact=3295  
   Buffers: shared hit=3308  
   ->  BitmapOr  (cost=42.98..42.98 rows=3247 width=0) (actual time=0.650..0.650 rows=0 loops=1)  
         Buffers: shared hit=13  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..33.35 rows=3246 width=0) (actual time=0.638..0.638 rows=3329 loops=1)  
               Index Cond: (taobao.c2 = 1)  
               Buffers: shared hit=8  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)  
               Index Cond: (taobao.c3 = '2017-02-06 09:41:24.985878'::timestamp without time zone)  
               Buffers: shared hit=5  
 Planning time: 0.099 ms  
 Execution time: 7.161 ms  
(15 rows)  

任意3字段

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261' and c5=1 and c6='test';  
                                                                 QUERY PLAN                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=22.00..23.02 rows=1 width=90) (actual time=0.051..0.051 rows=0 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) AND (taobao.c5 = '1'::numeric) AND (taobao.c6 = 'test'::text))  
   Buffers: shared hit=13  
   ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..22.00 rows=1 width=0) (actual time=0.048..0.048 rows=0 loops=1)  
         Index Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) AND (taobao.c5 = '1'::numeric) AND (taobao.c6 = 'test'::text))  
         Buffers: shared hit=13  
 Planning time: 0.115 ms  
 Execution time: 0.084 ms  
(9 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from taobao where c4='b40c1febdb7f62c916d3632a03092261' or c5=1 or c6='test';  
                                                              QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.taobao  (cost=24.03..27.08 rows=3 width=90) (actual time=0.061..0.062 rows=1 loops=1)  
   Output: c1, c2, c3, c4, c5, c6  
   Recheck Cond: ((taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text) OR (taobao.c5 = '1'::numeric) OR (taobao.c6 = 'test'::text))  
   Heap Blocks: exact=1  
   Buffers: shared hit=16  
   ->  BitmapOr  (cost=24.03..24.03 rows=3 width=0) (actual time=0.055..0.055 rows=0 loops=1)  
         Buffers: shared hit=15  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.025..0.025 rows=1 loops=1)  
               Index Cond: (taobao.c4 = 'b40c1febdb7f62c916d3632a03092261'::text)  
               Buffers: shared hit=5  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)  
               Index Cond: (taobao.c5 = '1'::numeric)  
               Buffers: shared hit=5  
         ->  Bitmap Index Scan on idx_taobao_gin  (cost=0.00..8.01 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
               Index Cond: (taobao.c6 = 'test'::text)  
               Buffers: shared hit=5  
 Planning time: 0.110 ms  
 Execution time: 0.122 ms  
(18 rows)  

任意4字段

不再举例,都能用上索引

2. 由于ctid list组织,所以可以做到很好的压缩,前面讲的posting list compress就是这个功效。所以它节约空间,提升效率。

postgres=# create table t4(c1 int,c2 int);  
CREATE TABLE  
postgres=# insert into t4 select generate_series(1,10000000),1;  
INSERT 0 10000000  

postgres=# create index btree_t4_c1 on t4 using btree(c1);  
CREATE INDEX  
postgres=# create index btree_t4_c2 on t4 using btree(c2);  
CREATE INDEX  

postgres=# create index btree_t4_c1 on t4 using btree(c1);  
CREATE INDEX  
postgres=# create index btree_t4_c2 on t4 using btree(c2);  
CREATE INDEX  

postgres=# create index gin_t4_c1 on t4 using gin(c1);  
CREATE INDEX  
postgres=# create index gin_t4_c2 on t4 using gin(c2);  
CREATE INDEX  

postgres=# \di+  
                            List of relations  
 Schema |    Name     | Type  |  Owner   | Table |  Size   | Description   
--------+-------------+-------+----------+-------+---------+-------------  
 public | btree_t4_c1 | index | postgres | t4    | 214 MB  | // btree为全entry索引,  
 public | btree_t4_c2 | index | postgres | t4    | 214 MB  | // 所以即使c2字段1000万全重复值,存储空间也一样  
 public | gin_t4_c1   | index | postgres | t4    | 534 MB  | // 原本存储heap ctid的,被拆分为pos 2 bytes、blkid 4 bytes来存储posting list的长度、posting tree root的blkid,所以比btree多了一点  
 public | gin_t4_c2   | index | postgres | t4    | 10 MB   | // ctid list(posting list)自动压缩,压缩比很高  

3. 使用btree_gin插件,可以对任意标量数据类型创建GIN索引,前面已有例子。

4. gin对多值类型(如数组、文本、全文检索)的支持就不多说了,那是GIN的发源地,支持非常棒。

注意,目前gin还不支持sort,所以如果你有大数据量的ORDER BY limit 小数据输出需求,建议还是使用b-tree。

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 where c2=1 order by c2 limit 1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=81163.88..81163.90 rows=1 width=8) (actual time=754.234..754.235 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=1307  
   ->  Bitmap Heap Scan on public.t4  (cost=81163.88..250411.70 rows=9999985 width=8) (actual time=754.234..754.234 rows=1 loops=1)  
         Output: c1, c2  
         Recheck Cond: (t4.c2 = 1)  
         Heap Blocks: exact=1  
         Buffers: shared hit=1307  
         ->  Bitmap Index Scan on gin_t4_c2  (cost=0.00..78663.89 rows=9999985 width=0) (actual time=745.651..745.651 rows=10000000 loops=1)   
             // gin还不适合limit输出,但是可以通过修改内核来改进  
               Index Cond: (t4.c2 = 1)  
               Buffers: shared hit=1306  
 Planning time: 0.091 ms  
 Execution time: 754.261 ms  
(13 rows)  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 order by c2 limit 1;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.43..0.46 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=1 read=3  
   ->  Index Scan using btree_t4_c2 on public.t4  (cost=0.43..221670.43 rows=10000000 width=8) (actual time=0.030..0.030 rows=1 loops=1)  
         Output: c1, c2  
         Buffers: shared hit=1 read=3  
 Planning time: 0.141 ms  
 Execution time: 0.048 ms  
(8 rows)  

postgres=# drop index btree_t4_c2;  

// 不限制c2任何条件的话,不能使用gin,order by也不能使用gin  

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 order by c2 limit 1;  
                                                                    QUERY PLAN                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=10000194247.77..10000194247.78 rows=1 width=8) (actual time=1719.522..1719.523 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=44248  
   ->  Sort  (cost=10000194247.77..10000219247.74 rows=9999985 width=8) (actual time=1719.520..1719.520 rows=1 loops=1)  
         Output: c1, c2  
         Sort Key: t4.c2  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=44248  
         ->  Seq Scan on public.t4  (cost=10000000000.00..10000144247.85 rows=9999985 width=8) (actual time=0.009..754.991 rows=10000000 loops=1)  
               Output: c1, c2  
               Buffers: shared hit=44248  
 Planning time: 0.084 ms  
 Execution time: 1719.543 ms  
(13 rows)  

// 限制c2任何条件的话,可以使用gin,但是order by依旧不能使用gin  
postgres=# set enable_sort=off;  
SET  
Time: 0.112 ms  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t4 where c2>0 order by c2 limit 1;  
                                                                     QUERY PLAN                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=10000301719.00..10000301719.00 rows=1 width=8) (actual time=2801.874..2801.875 rows=1 loops=1)  
   Output: c1, c2  
   Buffers: shared hit=45554  
   ->  Sort  (cost=10000301719.00..10000326719.00 rows=10000000 width=8) (actual time=2801.873..2801.873 rows=1 loops=1)  
         Output: c1, c2  
         Sort Key: t4.c2  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=45554  
         ->  Bitmap Heap Scan on public.t4  (cost=82471.00..251719.00 rows=10000000 width=8) (actual time=817.773..1905.348 rows=10000000 loops=1)  
               Output: c1, c2  
               Recheck Cond: (t4.c2 > 0)  
               Heap Blocks: exact=44248  
               Buffers: shared hit=45554  
               ->  Bitmap Index Scan on gin_t4_c2  (cost=0.00..79971.00 rows=10000000 width=0) (actual time=809.234..809.234 rows=10000000 loops=1)  
                     Index Cond: (t4.c2 > 0)  
                     Buffers: shared hit=1306  
 Planning time: 0.103 ms  
 Execution time: 2801.909 ms  
(18 rows)  

十 小结

宝剑赠英雄,美玉配佳人。

PostgreSQL 数据库,了解越多,才会更加随心所欲,驾驭自如。

参考

https://www.postgresql.org/docs/9.6/static/indexes-bitmap-scans.html

https://www.postgresql.org/docs/current/static/indexes-multicolumn.html

https://www.postgresql.org/docs/9.6/static/btree-gin.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html

https://www.postgresql.org/docs/9.6/static/pageinspect.html

《深入浅出PostgreSQL B-Tree索引结构》

《B-Tree和B+Tree》

《PostgreSQL GIN索引实现原理》

《从难缠的模糊查询聊开 - PostgreSQL独门绝招之一 GIN , GiST , SP-GiST , RUM 索引原理与技术背景》

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
1月前
|
关系型数据库 分布式数据库 数据库
PolarDB常见问题之加了索引但是查询没有使用如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
3月前
|
关系型数据库 PostgreSQL
PostgreSQL排序字段不唯一导致分页查询结果出现重复数据
PostgreSQL排序字段不唯一导致分页查询结果出现重复数据
41 0
|
4月前
|
关系型数据库 MySQL 分布式数据库
PolarDB MySQL版并行查询技术探索与实践
PolarDB MySQL版并行查询技术探索与实践 PolarDB MySQL版在企业级查询加速特性上进行了深度技术探索,其中并行查询作为其重要组成部分,已经在线稳定运行多年,持续演进。本文将详细介绍并行查询的背景、挑战、方案、特性以及实践。
108 2
|
8天前
|
SQL 存储 Oracle
关系型数据库查询数据的语句
本文介绍了关系型数据库中的基本SQL查询语句,包括选择所有或特定列、带条件查询、排序、分组、过滤分组、表连接、限制记录数及子查询。SQL还支持窗口函数、存储过程等高级功能,是高效管理数据库的关键。建议深入学习SQL及相应数据库系统文档。
9 2
|
2月前
|
SQL 关系型数据库 分布式数据库
在PolarDB for PostgreSQL中,你可以使用LIKE运算符来实现类似的查询功能,而不是使用IF函数
在PolarDB for PostgreSQL中,你可以使用LIKE运算符来实现类似的查询功能,而不是使用IF函数
43 7
|
2月前
|
存储 关系型数据库 分布式数据库
PolarDB for PostgreSQL查询问题之条件查询失败如何解决
PolarDB for PostgreSQL是基于PostgreSQL开发的一款云原生关系型数据库服务,它提供了高性能、高可用性和弹性扩展的特性;本合集将围绕PolarDB(pg)的部署、管理和优化提供指导,以及常见问题的排查和解决办法。
|
3月前
|
存储 关系型数据库 分布式数据库
阿里云PolarDB解决乐麦多源数据存储性能问题
乐麦通过使用PolarDB数据库,使整个系统之间的数据查询分析更加高效
390 3
|
3月前
|
SQL 关系型数据库 分布式数据库
在PolarDB for PostgreSQL中,你可以使用LIKE运算符来实现类似的查询功能
在PolarDB for PostgreSQL中,你可以使用LIKE运算符来实现类似的查询功能【1月更文挑战第13天】【1月更文挑战第65篇】
31 2
|
4月前
|
SQL 关系型数据库 分布式数据库
深度解析PolarDB数据库并行查询技术
深度解析PolarDB数据库并行查询技术:加速SQL执行的关键问题和核心技术 随着数据规模的不断扩大,用户SQL的执行时间越来越长,这不仅对数据库的优化能力提出更高的要求,并且对数据库的执行模式也提出了新的挑战。为了解决这个问题,许多数据库系统,包括Oracle、SQL Server等,都开始提供并行查询引擎的支持,以充分利用系统资源,达到加速SQL执行的效果。本文将深入探讨基于代价进行并行优化、并行执行的云数据库的并行查询引擎的关键问题和核心技术。
123 2
|
4月前
|
关系型数据库 数据库 PostgreSQL
PostgreSQL【应用 01】使用Vector插件实现向量相似度查询(Docker部署的PostgreSQL安装pgvector插件说明)和Milvus向量库对比
PostgreSQL【应用 01】使用Vector插件实现向量相似度查询(Docker部署的PostgreSQL安装pgvector插件说明)和Milvus向量库对比
186 1

相关产品

  • 云原生数据库 PolarDB