Groonga开源搜索引擎——列存储做聚合,没有内建分布式,分片和副本是随mysql或者postgreSQL作为存储引擎由MySQL自身来做分片和副本的

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
简介:

1. Characteristics of Groonga

ppt:http://mroonga.org/publication/presentation/groonga-mysqluc2011.pdf

1.1. Groonga overview

Groonga is a fast and accurate full text search engine based on inverted index. One of the characteristics of Groonga is that a newly registered document instantly appears in search results. Also, Groonga allows updates without read locks. These characteristics result in superior performance on real-time applications.

Groonga is also a column-oriented database management system (DBMS). Compared with well-known row-oriented systems, such as MySQL and PostgreSQL, column-oriented systems are more suited for aggregate queries. Due to this advantage, Groonga can cover weakness of row-oriented systems.

The basic functions of Groonga are provided in a C library. Also, libraries for using Groonga in other languages, such as Ruby, are provided by related projects. In addition, groonga-based storage engines are provided for MySQL and PostgreSQL. These libraries and storage engines allow any application to use Groonga. See usage examples.

1.2. Full text search and Instant update

In widely used DBMSs, updates are immediately processed, for example, a newly registered record appears in the result of the next query. In contrast, some full text search engines do not support instant updates, because it is difficult to dynamically update inverted indexes, the underlying data structure.

Groonga also uses inverted indexes but supports instant updates. In addition, Groonga allows you to search documents even when updating the document collection. Due to these superior characteristics, Groonga is very flexible as a full text search engine. Also, Groonga always shows good performance because it divides a large task, inverted index merging, into smaller tasks.

1.3. Column store and aggregate query

People can collect more than enough data in the Internet era. However, it is difficult to extract informative knowledge from a large database, and such a task requires a many-sided analysis through trial and error. For example, search refinement by date, time and location may reveal hidden patterns. Aggregate queries are useful to perform this kind of tasks.

An aggregate query groups search results by specified column values and then counts the number of records in each group. For example, an aggregate query in which a location column is specified counts the number of records per location. Making a graph from the result of an aggregate query against a date column is an easy way to visualize changes over time. Also, a combination of refinement by location and an aggregate query against a date column allows visualization of changes over time in specific location. Thus refinement and aggregation are important to perform data mining.

A column-oriented architecture allows Groonga to efficiently process aggregate queries because a column-oriented database, which stores records by column, allows an aggregate query to access only a specified column. On the other hand, an aggregate query on a row-oriented database, which stores records by row, has to access neighbor columns, even though those columns are not required.

1.4. Inverted index and tokenizer

An inverted index is a traditional data structure used for large-scale full text search. A search engine based on inverted index extracts index terms from a document when it is added. Then in retrieval, a query is divided into index terms to find documents containing those index terms. In this way, index terms play an important role in full text search and thus the way of extracting index terms is a key to a better search engine.

A tokenizer is a module to extract index terms. A Japanese full text search engine commonly uses a word-based tokenizer (hereafter referred to as a word tokenizer) and/or a character-based n-gram tokenizer (hereafter referred to as an n-gram tokenizer). A word tokenizer-based search engine is superior in time, space and precision, which is the fraction of relevant documents in a search result. On the other hand, an n-gram tokenizer-based search engine is superior in recall, which is the fraction of retrieved documents in the perfect search result. The best choice depends on the application in practice.

Groonga supports both word and n-gram tokenizers. The simplest built-in tokenizer uses spaces as word delimiters. Built-in n-gram tokenizers (n = 1, 2, 3) are also available by default. In addition, a yet another built-in word tokenizer is available if MeCab, a part-of-speech and morphological analyzer, is embedded. Note that a tokenizer is pluggable and you can develop your own tokenizer, such as a tokenizer based on another part-of-speech tagger or a named-entity recognizer.

1.5. Sharable storage and read lock-free

Multi-core processors are mainstream today and the number of cores per processor is increasing. In order to exploit multiple cores, executing multiple queries in parallel or dividing a query into sub-queries for parallel processing is becoming more important.

A database of Groonga can be shared with multiple threads/processes. Also, multiple threads/processes can execute read queries in parallel even when another thread/process is executing an update query because Groonga uses read lock-free data structures. This feature is suited to a real-time application that needs to update a database while executing read queries. In addition, Groonga allows you to build flexible systems. For example, a database can receive read queries through the built-in HTTP server of Groonga while accepting update queries through MySQL.

1.6. Geo-location (latitude and longitude) search

Location services are getting more convenient because of mobile devices with GPS. For example, if you are going to have lunch or dinner at a nearby restaurant, a local search service for restaurants may be very useful, and for such services, fast geo-location search is becoming more important.

Groonga provides inverted index-based fast geo-location search, which supports a query to find points in a rectangle or circle. Groonga gives high priority to points near the center of an area. Also, Groonga supports distance measurement and you can sort points by distance from any point.

1.7. Groonga library

The basic functions of Groonga are provided in a C library and any application can use Groonga as a full text search engine or a column-oriented database. Also, libraries for languages other than C/C++, such as Ruby, are provided in related projects. See related projects for details.

1.8. Groonga server

Groonga provides a built-in server command which supports HTTP, the memcached binary protocol and the Groonga Query Transfer Protocol (GQTP). Also, a Groonga server supports query caching, which significantly reduces response time for repeated read queries. Using this command, Groonga is available even on a server that does not allow you to install new libraries.

1.9. Mroonga storage engine

Groonga works not only as an independent column-oriented DBMS but also as storage engines of well-known DBMSs. For example, Mroonga is a MySQL pluggable storage engine using Groonga. By using Mroonga, you can use Groonga for column-oriented storage and full text search. A combination of a built-in storage engine, MyISAM or InnoDB, and a Groonga-based full text search engine is also available. All the combinations have good and bad points and the best one depends on the application. See related projects for details.

 

转自:http://groonga.org/docs/characteristic.html















本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6800785.html,如需转载请自行联系原作者

相关实践学习
基于CentOS快速搭建LAMP环境
本教程介绍如何搭建LAMP环境,其中LAMP分别代表Linux、Apache、MySQL和PHP。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
相关文章
|
6天前
|
关系型数据库 MySQL 分布式数据库
《MySQL 简易速速上手小册》第6章:MySQL 复制和分布式数据库(2024 最新版)
《MySQL 简易速速上手小册》第6章:MySQL 复制和分布式数据库(2024 最新版)
37 2
|
4天前
|
存储 关系型数据库 分布式数据库
电子好书发您分享《使用云起实验室体验PolarDB分布式版》
探索PolarDB分布式魅力:[《使用云起实验室体验PolarDB分布式版》](https://developer.aliyun.com/ebook/8335/116575?spm=a2c6h.26392459.ebook-detail.5.62e645c0hzSNhM) —— 一本指南,带你通过阿里云云起实验室动手实践分布式数据库技术,助力云上高效存储。
19 2
|
4天前
|
存储 关系型数据库 分布式数据库
电子好书发您分享《PolarDB分布式版架构介绍PolarDB分布式版架构介绍》
**《PolarDB分布式版架构介绍》电子书分享:** 探索阿里云PolarDB分布式设计,采用计算存储分离,借助GMS、CN组件实现大规模扩展。[阅读更多](https://developer.aliyun.com/ebook/8332/116553?spm=a2c6h.26392459.ebook-detail.5.3b3b2ccbVVjjt0)
14 3
|
16天前
|
Docker 容器 关系型数据库
【PolarDB-X从入门到精通】 第四讲:PolarDB分布式版安装部署(源码编译部署)
本期课程将于4月11日19:00开始直播,内容包括源码编译基础知识和实践操作,课程目标是使学员掌握源码编译部署技能,为未来发展奠定基础,期待大家在课程中取得丰富的学习成果!
【PolarDB-X从入门到精通】 第四讲:PolarDB分布式版安装部署(源码编译部署)
|
1天前
|
存储 缓存 关系型数据库
MySQL 存储引擎
MySQL 存储引擎
15 6
|
2天前
|
关系型数据库 分布式数据库 数据库
电子好书发您分享《PolarDB分布式版架构介绍》
阅读阿里云电子书《PolarDB分布式版架构介绍》,深入理解这款高性能数据库的分布式架构设计。书中通过图文并茂的方式揭示了PolarDB在分布式场景下的核心特性和技术优势,适合数据库爱好者和云计算从业者学习。[阅读链接](https://developer.aliyun.com/ebook/8332/116553?spm=a2c6h.26392459.ebook-detail.5.4ab72ccbIzDq2Q)
|
3天前
|
存储 SQL 关系型数据库
电子好书发您分享《PolarDB分布式版架构介绍》
**PolarDB分布式版详解:** 阿里云的PolarDB采用计算存储分离架构,利用GMS进行元数据管理,CN处理分布式SQL。结合PolarFS,实现高效存储与计算,支持大规模扩展。[阅读完整架构介绍](https://developer.aliyun.com/ebook/8332/116553?spm=a2c6h.26392459.ebook-detail.5.5b912ccbE20nqg)
|
3天前
|
关系型数据库 OLAP 分布式数据库
「杭州*康恩贝」4月26日PolarDB开源数据库沙龙,开启报名!
4月26日周五,PolarDB开源社区联合康恩贝将共同举办开源数据库技术沙龙,本次沙龙我们邀请了众多数据库领域的专家,期待大家的参与!
「杭州*康恩贝」4月26日PolarDB开源数据库沙龙,开启报名!
|
13天前
|
运维 关系型数据库 分布式数据库
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
4月19日周五,PolarDB开源社区联合科大讯飞共同举办开源数据库技术沙龙,本次沙龙我们邀请了众多数据库领域的专家,期待大家的参与!
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
|
17天前
|
存储 缓存 关系型数据库
mysql存储引擎
mysql存储引擎

热门文章

最新文章