4 Things You Can Do with Alibaba Cloud PolarDB

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
简介: In this article, we spoke with He Jun, Alibaba Cloud Technical Expert, to learn about the key features and common use cases of PolarDB.

During the PolarDB session of the 2017 Computing Conference, Alibaba Cloud's high level Technical Expert He Jun delivered a speech on the features and common use cases of PolarDB. In his speech, He Jun discussed the structure of PolarDB, introduced its features, and finally shared insights on some common use cases.

The following sections highlights the main points from his speech.

Product Architecture

I was pleasantly surprised when I first encountered PolarDB, as in my understanding, it represents a cross-generational milestone product that combines innovations in computing, storage, networking, and more. It implements a new design concept called Cloud Native, which is far different from the database design concepts we spoke about before. The earliest relation to modern databases is the relational database produced by the computing power available in the IT era. However, while moving computing capability onto the publicly accessible cloud and connecting it to user businesses generated a number of new innovations, they are far from sufficient in the long term. Why? Today, we are required to develop a cloud-based relational database targeted at public cloud environments and the user businesses that run in them. This is no small task.

PolarDB utilizes a structure that separates computing and storage, which is much easier said than done. The reason for combining computing and storage, after all, is to improve performance. The primary consideration in building a relational database is performance, so while separating storage and computing seems like an easy concept, actually doing it without sacrificing performance is quite difficult.

Today, the separation of computing and storage in PolarDB is a bold innovation that's no longer stuck in the concept phase, but has been both realized and implemented. Where is the difficulty in building a relational database? It needs to be compatible with ACID semantics, otherwise it will be unable to support business situations that require online operations. If ACID compatibility, performance, and flexibility on the public cloud are all crucial, then we also need to take into consideration performance to cost ratio. Looking at commercial databases on the market, most of them are more or less a fantasy. Is it even possible to combine all required functionality, capability, and acceptable performance to cost ratio in a framework that sufficiently supports all necessary business scenarios? We have, through superior understanding of business applications and accumulated experience on the public cloud, implemented a single write multiple read database framework to significantly simplify the complexity of previous multiple write databases. Furthermore, we are able to satisfy the needs of the vast majority of use cases. We have implemented a proprietary distributed storage engine as the core of our arsenal, allowing PolarDB to provide flexibility on multiple dimensions.

1

The system has three layers, as we can see in this figure. The top layer is DBserver, which implements a single master, multiple slave framework whereby other nodes are able to expand or contract as needed to support any request. The lowest layer is distributed, fast storage devices.

PolarDB Features

What makes PolarDB special? First, a relational database absolutely must have high performance. If a relational database has poor performance, it will have difficulty satisfying the need to process the explosive growth of data characteristic of the current Internet era. So when I say that PolarDB performance is high, what exactly does that mean?

  • High speed Single Point QBS can easily reach 500,000
    Because PolarDB uses shared distributed storage, performance when adding a new read-only node is quite high, and when sharing data, we don't have to add a new read-only instance and replicate the data. This reduces overhead from replicating data, as adding a new read-only instance only takes 1-5 minutes. It is also completely unaffected by the size of the data in the database. What's more, with a single master multiple read structure, we are able to keep latency down to a matter of milliseconds. We can also create backups in seconds. Each of these functions features extremely high performance.
  • Super high capacity
    Using data to a certain point, it seems that once the size reaches around 2TB most databases become useless. Today, PolarDB is capable of providing capacity of up to 100TB, which, from the perspective of relational frameworks, is an enormous amount of data.
  • Automatic scaling according to necessity
    The PolarDB data structure makes full use of the flexibility offered by the cloud, enabling the system to scale flexibly according to changes in the user's application.
  • MySQL compatibility
    There are already more open source database instances combined than Oracle instances, and this trend is increasing every year. We are already nearing 100% compatibility, and will continue to improve support for SQL standards as quickly as possible.
  • High reliability and availability
    PolarDB uses a one master many slaves framework, which naturally offers high availability. If the master node crashes, it will automatically be directed to the command node. At the same time, the existence of multiple data copies means that the data is naturally more reliable.

PolarDB in Production Scenarios

2

When talking about the capabilities of PolarDB as a product, remember that the birth of a product, its value, and its reputation, are all dependent on the services it provides. If users don't use it and it doesn't solve pain points in their application scenarios, then it's difficult to say that the product has any value at all. For a user on the public cloud, the product must first take into consideration whether or not a cloud database can solve the user's needs. If I have a new service, or even an existing service that I want to move to the cloud, then I want to use a database with a high performance to cost ratio, and it should be a next gen database. Moving my data to the cloud involves the cost of migrating all of my users to the cloud as well.

This migration cost is quite low if all users are very easy to migrate. However, if migrating users involves changing business procedures, then the process becomes quite painful and brings with it hidden dangers according to what the user does. We have to provide strong performance if we are to satisfy the needs of high end users. From business to the cloud, I trust the public cloud, and in turn Alibaba Cloud. When you provide services 24/7, you can't afford any interruptions. As users increase, it becomes crucially important that your database be flexible enough, expandable enough to satisfy the needs of every business scenario.

Finally, data must be reliable. It is only once these needs are met that a database service is able to provide real value to the user. Next I will introduce and analyze four use cases to illustrate the capabilities and services offered by PolarDB.

Use Case 1: High Throughput Processing of Big Data

3

High throughput processing capability of large data volumes. In its earliest days, the public cloud serviced website users. As the public cloud improved and software on it continued to evolve, it gradually grew to become something very different. With the introduction of large users, medium users, and even smaller users with high growth potential, the services and data running on the cloud have become exponentially larger. We know that, in the mobile Internet era, data is used not only to solve users' needs, but it may very well become much more important, serving as a balance between supply and demand. Because of today's calculations, we know how to increase production efficiency, and as production becomes more and more efficient, so does the efficiency of user service scenarios as well as performance to cost ratios. Because we have gathered knowledge of user needs by servicing them and collecting their data, we have a much better understanding of what we need to provide. This allows us to react to changing needs and even become aware of changes in the collected data itself. Data has the possibility of changing the balance between supply and demand, which is a major contribution of the big data era. As data grows infinitely, databases become the supporting computing power that enables commercial civilization on the backend. Similarly, with the addition of data, the database requires more computing power to be able to process and utilize the data.

We utilize an architecture that separates reads and writes in order to accommodate more user processing systems. At the same time, we implement a shared storage system that allows us to provide storage of over 100TB and respond to the explosive growth of web-scale data.

Use Case 2: High Availability and Business Flexibility

4

A few years ago, when I was a developer, I was involved in developing high availability software. At the time, we wanted to install open source MySQL with two single nodes, purchase another piece of high availability software, and learn how to configure it in order to make the LAMP architecture highly available on two machines. Today, on the public cloud, we can use technology at a lower cost, and use it to serve more users cheaply. The value brought by the cloud is enormous.

Looking at this image, we see that when the CPU and memory on a computing node in PolarDB is insufficient, we can quickly and easily expand accordingly. Today we can use a shared storage framework to scale up or scale in. When there aren't many read tasks, we can even delete some read nodes. Because of today's competition, marketing, and changes in the Internet ecology, the time frame for our services could be reduced to a matter of hours or even minutes. For example, in e-commerce you sometimes have to deal with bid sniping, where data could surge in just an hour. However, if we're able to add a read-only node each minute, this kind of load poses much less of a problem.

Use Case 3: Cloudification and Migration

5

When something new and more advanced comes on the market, we naturally want to give it a try, but that becomes quite difficult if we have to change our business processes. If we have MySQL compatibility, then putting our business on the cloud is quite simple. Then, if we use cloudification tools and perform logical migration, then the entire cloudification and cloud migration process is quite smooth.

Today we have already entered an age of cloud computing, IoT, and artificial intelligence. Before, we used to say that the Internet would move from online to offline, maybe some traditional businesses would move to the cloud, and maybe artificial intelligence would open up new forms of business. It's possible that industry + the Internet will embrace the high performance to cost ratio, flexible, easily deployable cloud. With these kinds of migration tools, issues of compatibility are easily solved and the cost of the entire process of migrating to the cloud is reduced greatly.

Use Case 4: High Reliability and Backups for Disaster Recovery

6

The last point is high reliability and backups for disaster recovery. The above diagram shows a framework diagram of PolarDB with PolarDB as a cluster architecture on the DBserver layer. For a cluster architecture, network connectivity can be considered a mission critical application service. Because of PolarDB's high reliability, it is ideal to be used for backups and disaster recovery scenarios.

Conclusion

Looking back, as I have personally come to understand PolarDB, I see it as a database product that combines imagination with creativity and adaptability. We believe that the spirit of PolarDB is one of faith combined with hard work and effort, and that is why we are able to present such a product to you all today.

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
The Evolution of Alibaba Cloud's Relational Database Services Architecture – PolarDB
This article discusses the history of Alibaba Cloud's RDS architecture, as well as the motivation behind the development of PolarDB.
4859 0
The Evolution of Alibaba Cloud's Relational Database Services Architecture – PolarDB
100TB Capacity and 6x Performance Improvement with Alibaba Cloud PolarDB
This article focuses on the optimizations of Alibaba Cloud PolarDB's compute and storage engines to offer an unparalleled performance.
5959 0
100TB Capacity and 6x Performance Improvement with Alibaba Cloud PolarDB
PolarDB开源数据库进阶课17 集成数据湖功能
本文介绍了如何在PolarDB数据库中接入pg_duckdb、pg_mooncake插件以支持数据湖功能, 可以读写对象存储的远程数据, 支持csv, parquet等格式, 支持delta等框架, 并显著提升OLAP性能。
71 1
PolarDB开源数据库进阶课11 激活容灾(Standby)节点
本文介绍了如何激活PolarDB容灾(Standby)节点,实验环境依赖于Docker容器中用loop设备模拟共享存储。通过`pg_ctl promote`命令可以将Standby节点提升为主节点,使其能够接收读写请求。激活后,原Standby节点不能再成为PolarDB集群的Standby节点。建议删除对应的复制槽位以避免WAL文件堆积。相关操作和配置请参考系列文章及视频教程。
45 1
PolarDB开源数据库进阶课15 集成DeepSeek等大模型
本文介绍了如何在PolarDB数据库中接入私有化大模型服务,以实现多种应用场景。实验环境依赖于Docker容器中的loop设备模拟共享存储,具体搭建方法可参考相关系列文章。文中详细描述了部署ollama服务、编译并安装http和openai插件的过程,并通过示例展示了如何使用这些插件调用大模型API进行文本分析和情感分类等任务。此外,还探讨了如何设计表结构及触发器函数自动处理客户反馈数据,以及生成满足需求的SQL查询语句。最后对比了不同模型的回答效果,展示了deepseek-r1模型的优势。
113 0
PolarDB开源数据库进阶课14 纯享单机版
PolarDB不仅支持基于“共享存储+多计算节点”的集群版,还提供类似开源PostgreSQL的单机版。单机版部署简单,适合大多数应用场景,并可直接使用PostgreSQL生态插件。通过Docker容器、Git克隆代码、编译软件等步骤,即可完成PolarDB单机版的安装与配置。具体操作包括启动容器、进入容器、克隆代码、编译软件、初始化实例、配置参数及启动数据库。此外,还有多个相关教程和视频链接供参考,帮助用户更好地理解和使用PolarDB单机版。
61 0
PolarDB开源数据库进阶课13 单机版转换为集群版
本文介绍如何将“本地存储实例”转换为“共享存储实例”,依赖于先前搭建的实验环境。主要步骤包括:准备PFS二进制文件、格式化共享盘为pfs文件系统、启动pfsd服务、停库并拷贝数据到pfs内、修改配置文件,最后启动实例。通过这些操作,成功实现了从本地存储到共享存储的转换,并验证了新实例的功能。相关系列文章和视频链接提供了更多背景信息和技术细节。
30 0
PolarDB开源数据库进阶课7 实时流式归档
本文介绍了如何在PolarDB RAC一写多读集群中实现实时归档,确保WAL日志的及时备份。实验依赖于Docker容器和loop设备模拟的共享存储环境。通过配置主节点的`pg_hba.conf`、创建复制槽以及使用`pg_receivewal`工具,实现实时接收并归档WAL文件。此外,还提供了详细的命令行帮助和相关文档链接,方便读者参考和操作。注意:如果已搭建容灾节点,则无需重复进行实时归档。
17 0
喜报|PolarDB开源社区荣获“2024数据库国内活跃开源项目”奖
喜报|PolarDB开源社区荣获“2024数据库国内活跃开源项目”奖