mongoDB Indexing Advice

本文涉及的产品
云数据库 MongoDB,通用型 2核4GB
简介:
mongoDB建索引的几个建议:
1. 建立索引的前提是查询条件,建立完之后explain查看有索引和无索引的开销对比.能建立唯一的索引尽量唯一.
如:
> for (i=0;i<10000;i++) {
... db.userinfo.insert({"firstname" : "zhou" + i,"lastname" : "digoal" + i,"age" : i,"city" : "HangZhou"})
... }
> db.userinfo.ensureIndex({"firstname" : 1,"lastname" : 1,"age" : 1})
# 当查询条件中含有驱动列时优化器将选择索引查询
> db.userinfo.find({"firstname" : "zhou100"},{"city" : 1}).explain()
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 0,
        "indexBounds" : {
                "firstname" : [
                        [
                                "zhou100",
                                "zhou100"
                        ]
                ],
                "lastname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        }
}
# 当查询条件中不包含驱动列时,不走索引.
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).explain()                                    
{
        "cursor" : "BasicCursor",
        "nscanned" : 10000,
        "nscannedObjects" : 10000,
        "n" : 1,
        "millis" : 5,
        "indexBounds" : {

        }
}
# 当然,你可以强制使用索引,不过millis就不是5了,增加到16
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).hint({"firstname" : 1,"lastname" : 1,"age" : 1}).explain()
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 16,
        "indexBounds" : {
                "firstname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "lastname" : [
                        [
                                "digoal100",
                                "digoal100"
                        ]
                ],
                "age" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        }
}
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).hint({"firstname" : 1,"lastname" : 1,"age" : 1})
{ "_id" : ObjectId("4d266637c119c7060b212fd6"), "city" : "HangZhou" }

2. 索引一定要建立在选择性好的列上面,对于选择性不好的列建索引效果不好.(对于选择性不好的列建索引需要考虑复合索引)
如下:当使用选择性不好的索引时,查询时间分别是8ms,26ms,而不使用索引时 分别是6ms,9ms,因此选择性不好的索引千万别建。
> db.userinfo.dropIndex({"city" : 1})                                                                       
{ "nIndexesWas" : 3, "ok" : 1 }
> db.userinfo.find({"city" : "HangZhou"}).explain()                                                                   
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 10002,
        "millis" : 6,
        "indexBounds" : {

        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "millis" : 9,
        "indexBounds" : {

        }
}
> db.userinfo.ensureIndex({"city" : 1}) 
> db.userinfo.find({"city" : "HangZhou"}).explain()
{
        "cursor" : "BtreeCursor city_1",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 10002,
        "millis" : 8,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ]
        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BtreeCursor city_1",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "millis" : 26,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ]
        }
}
# 使用复合索引替代选择性不好的索引,返回时间= 0 ms
> db.userinfo.ensureIndex({"city" : 1,"age" : 1})             
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}

3. 当索引有多个可选时,mongoDB对索引的选择会是怎么样呢?
> db.userinfo.getIndexes()
[
        {
                "name" : "_id_",
                "ns" : "test.userinfo",
                "key" : {
                        "_id" : 1
                }
        },
        {
                "_id" : ObjectId("4d267377c119c7060b215687"),
                "ns" : "test.userinfo",
                "key" : {
                        "city" : 1,
                        "age" : 1
                },
                "name" : "city_1_age_1"
        },
        {
                "_id" : ObjectId("4d2677aac119c7060b215688"),
                "ns" : "test.userinfo",
                "key" : {
                        "firstname" : 1,
                        "lastname" : 1,
                        "age" : 1
                },
                "name" : "firstname_1_lastname_1_age_1"
        }
]
# 同样的结果,同样的索引,按照firstname和lastname排序,消耗的时间一个是0ms  另一个是18ms,这个有点纳闷
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" :1}).explain()
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 18,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"lastname" :1}).explain() 
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 0,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
# 偶尔也会走另一个索引{"firstname" :1,"lastname" :1,"age" :1}
# 强制使用另一个索引的开销18MS
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" : 1}).hint({"firstname" :1,"lastname" :1,"age" :1}).explain>
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 18,
        "indexBounds" : {
                "firstname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "lastname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
# 强制全表扫描的开销是9ms
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" : 1}).hint({"$natural":1}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 9,
        "indexBounds" : {

        }
}
# 看出来mongoDB是索引优先啊。

4. 确保索引可以放入内存。
当索引的大小不能被pin在内存中,那是比较悲剧的事情。返回bytes
> db.userinfo.totalIndexSize()
1089536

5. 权衡数据读写操作比例,索引多了对写就有影响.(MySQL的MyISAM引擎有一个比较好的参数DELAY_KEY_WRITE可以延时写索引,一定程度上缓解了索引对写带来的负担)

6. MongoDB's $ne or $nin operator's aren't efficient with indexes.
  • When excluding just a few documents, it's better to retrieve extra rows from MongoDB and do the exclusion on the client side.
      加入遇到这种情况,建议要么删除INDEX要么用HINT指定全表扫描.

7.  As your collection grows, you’ll need to create indexes for any large sorts your queries are doing. If you call sort on a key that is not indexed, MongoDB needs to pull all of that data into memory to sort it. Thus, there’s a limit on the amount you can sort without an index: you can’t sort a terabyte of data in memory. Once your collection is too big to sort in memory, MongoDB will just return an error for queries that try. Indexing the sort allows MongoDB to pull the sorted data in order, allowing you to sort any amount of data without running out of memory.

注意事项:
1. explain不处理insert操作,但是实际上insert已经操作,如:
> db.userinfo.insert({"firstname" : "zhou" ,"lastname" : "digoal" , "age" : 27 , "city" : "HangZhou"}).explain()
Fri Jan  7 09:44:01 TypeError: db.userinfo.insert({firstname:"zhou", lastname:"digoal", age:27, city:"HangZhou"}) has no properties (shell):0
> db.userinfo.insert({"firstname" : "zhou" ,"lastname" : "digoal" , "age" : 27 , "city" : "HangZhou"})          
> db.userinfo.find({"firstname" : "zhou"})
{ "_id" : ObjectId("4d266fe1c119c7060b215684"), "firstname" : "zhou", "lastname" : "digoal", "age" : 27, "city" : "HangZhou" }
{ "_id" : ObjectId("4d266fedc119c7060b215685"), "firstname" : "zhou", "lastname" : "digoal", "age" : 27, "city" : "HangZhou" }

其他:
1. explain output 
  • cursor: the value for cursor can be either BasicCursor or BtreeCursor. The second of these indicates that the given query is using an index.
  • nscanned: he number of documents scanned.
  • n: the number of documents returned by the query. You want the value of n to be close to the value of nscanned. What you want to avoid is doing a collection scan, that is, where every document in the collection is accessed. This is the case whennscanned is equal to the number of documents in the collection.
  • millis: the number of milliseconds require to complete the query. This value is useful for comparing indexing strategies, indexed vs. non-indexed queries, etc.
相关实践学习
MongoDB数据库入门
MongoDB数据库入门实验。
快速掌握 MongoDB 数据库
本课程主要讲解MongoDB数据库的基本知识,包括MongoDB数据库的安装、配置、服务的启动、数据的CRUD操作函数使用、MongoDB索引的使用(唯一索引、地理索引、过期索引、全文索引等)、MapReduce操作实现、用户管理、Java对MongoDB的操作支持(基于2.x驱动与3.x驱动的完全讲解)。 通过学习此课程,读者将具备MongoDB数据库的开发能力,并且能够使用MongoDB进行项目开发。 &nbsp; 相关的阿里云产品:云数据库 MongoDB版 云数据库MongoDB版支持ReplicaSet和Sharding两种部署架构,具备安全审计,时间点备份等多项企业能力。在互联网、物联网、游戏、金融等领域被广泛采用。 云数据库MongoDB版(ApsaraDB for MongoDB)完全兼容MongoDB协议,基于飞天分布式系统和高可靠存储引擎,提供多节点高可用架构、弹性扩容、容灾、备份回滚、性能优化等解决方案。 产品详情: https://www.aliyun.com/product/mongodb
相关文章
|
NoSQL JavaScript 前端开发
MongoDB系列--深入理解MongoDB聚合(Aggregation )
MongoDB中聚合(aggregate) 操作将来自多个document的value组合在一起,并通过对分组数据进行各种操作处理,并返回计算后的数据结果,主要用于处理数据(诸如统计平均值,求和等)。MongoDB提供三种方式去执行聚合操作:聚合管道(aggregation pipeline)、Map-Reduce函数以及单一的聚合命令(count、distinct、group)。
1121 0
MongoDB系列--深入理解MongoDB聚合(Aggregation )
|
canal Java 关系型数据库
Elastic实战:通过spring data elasticsearch实现索引的CRUD;实现mysql全量/增量同步到ES
elasticsearch官方的java客户端有tranport client,rest high level client,但进行索引的增删改查的操作不够简便。因此我们引入spring data elasticsearch来实现索引的CRUD
264 0
Elastic实战:通过spring data elasticsearch实现索引的CRUD;实现mysql全量/增量同步到ES
|
NoSQL 大数据 MongoDB
MongoDB 中Aggregate使用与相关限制
MongoDB 中Aggregate使用与相关限制
204 0
MongoDB 中Aggregate使用与相关限制
|
JSON JavaScript NoSQL
Invoice Application Backend Using ApsaraDB for MongoDB
In part three of this three-part tutorial, we will explore how to create the backend of a fully functional invoice application using ApsaraDB for MongoDB.
2760 0
Invoice Application Backend Using ApsaraDB for MongoDB
|
NoSQL
An Insight into MongoDB Sharding Chunk Splitting and Migration
Sharding is a method of data distribution across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
3051 0
|
存储 NoSQL 数据库
|
NoSQL 网络安全 开发工具