PostgreSQL 内存OOM控制策略导致数据库无法启动的诊断一例(如何有效避免oom)

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:
你可能遇到过类似的数据库无法启动的问题,
postgres@digoal-> FATAL:  XX000: could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 3322716160 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
LOCATION:  CreateAnonymousSegment, pg_shmem.c:398

通过查看meminfo可以得到原因。
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
              this is the total amount of  memory currently available to
              be allocated on the system. This limit is only adhered to
              if strict overcommit accounting is enabled (mode 2 in
              'vm.overcommit_memory').
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
              For more details, see the memory overcommit documentation
              in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system.
              The committed memory is a sum of all of the memory which
              has been allocated by processes, even if it has not been
              "used" by them as of yet. A process which malloc()'s 1G
              of memory, but only touches 300M of it will show up as
              using 1G. This 1G is memory which has been "committed" to
              by the VM and can be used at any time by the allocating
              application. With strict overcommit enabled on the system
              (mode 2 in 'vm.overcommit_memory'),allocations which would
              exceed the CommitLimit (detailed above) will not be permitted.
              This is useful if one needs to guarantee that processes will
              not fail due to lack of memory once that memory has been
              successfully allocated.

依据vm.overcommit_memory设置的值,
当vm.overcommit_memory=0时,不允许普通用户overcommit, 但是允许root用户轻微的overcommit。
当vm.overcommit_memory=1时,允许overcommit.
当vm.overcommit_memory=2时,Committed_AS不能大于CommitLimit。
commit 限制 计算方法
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
[root@digoal postgresql-9.4.4]# free
             total       used       free     shared    buffers     cached
Mem:       1914436     713976    1200460      72588      32384     529364
-/+ buffers/cache:     152228    1762208
Swap:      1048572     542080     506492
[root@digoal ~]# cat /proc/meminfo |grep Commit
CommitLimit:     2005788 kB
Committed_AS:     132384 kB
这个例子的2G就是以上公式计算得来。

overcommit限制的初衷是malloc后,内存并不是立即使用掉,所以如果多个进程同时申请一批内存的话,不允许OVERCOMMIT可能导致某些进程申请内存失败,但实际上内存是还有的。所以Linux内核给出了几种选择,2是比较靠谱或者温柔的做法。1的话风险有点大,因为可能会导致OOM。

所以当数据库无法启动时,要么你降低一下数据库申请内存的大小(例如降低shared_buffer或者max conn),要么就是修改一下overcommit的风格。

[参考]
1. kernel-doc-2.6.32/Documentation/filesystems/proc.txt
    MemTotal: Total usable ram (i.e. physical ram minus a few reserved
              bits and the kernel binary code)
     MemFree: The sum of LowFree+HighFree
MemAvailable: An estimate of how much memory is available for starting new
              applications, without swapping. Calculated from MemFree,
              SReclaimable, the size of the file LRU lists, and the low
              watermarks in each zone.
              The estimate takes into account that the system needs some
              page cache to function well, and that not all reclaimable
              slab will be reclaimable, due to items being in use. The
              impact of those factors will vary from system to system.
              This line is only reported if sysctl vm.meminfo_legacy_layout = 0
     Buffers: Relatively temporary storage for raw disk blocks
              shouldn't get tremendously large (20MB or so)
      Cached: in-memory cache for files read from the disk (the
              pagecache).  Doesn't include SwapCached
  SwapCached: Memory that once was swapped out, is swapped back in but
              still also is in the swapfile (if memory is needed it
              doesn't need to be swapped out AGAIN because it is already
              in the swapfile. This saves I/O)
      Active: Memory that has been used more recently and usually not
              reclaimed unless absolutely necessary.
    Inactive: Memory which has been less recently used.  It is more
              eligible to be reclaimed for other purposes
   HighTotal:
    HighFree: Highmem is all memory above ~860MB of physical memory
              Highmem areas are for use by userspace programs, or
              for the pagecache.  The kernel must use tricks to access
              this memory, making it slower to access than lowmem.
    LowTotal:
     LowFree: Lowmem is memory which can be used for everything that
              highmem can be used for, but it is also available for the
              kernel's use for its own data structures.  Among many
              other things, it is where everything from the Slab is
              allocated.  Bad things happen when you're out of lowmem.
   SwapTotal: total amount of swap space available
    SwapFree: Memory which has been evicted from RAM, and is temporarily
              on the disk
       Dirty: Memory which is waiting to get written back to the disk
   Writeback: Memory which is actively being written back to the disk
   AnonPages: Non-file backed pages mapped into userspace page tables
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
      Mapped: files which have been mmaped, such as libraries
        Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
  SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
  PageTables: amount of memory dedicated to the lowest level of page
              tables.
NFS_Unstable: NFS pages sent to the server, but not yet committed to stable
              storage
      Bounce: Memory used for block device "bounce buffers"
WritebackTmp: Memory used by FUSE for temporary writeback buffers
 CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
              this is the total amount of  memory currently available to
              be allocated on the system. This limit is only adhered to
              if strict overcommit accounting is enabled (mode 2 in
              'vm.overcommit_memory').
              The CommitLimit is calculated with the following formula:
              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
              overcommit_ratio / 100 + [total swap pages]
              For example, on a system with 1G of physical RAM and 7G
              of swap with a `vm.overcommit_ratio` of 30 it would
              yield a CommitLimit of 7.3G.
              For more details, see the memory overcommit documentation
              in vm/overcommit-accounting.
Committed_AS: The amount of memory presently allocated on the system.
              The committed memory is a sum of all of the memory which
              has been allocated by processes, even if it has not been
              "used" by them as of yet. A process which malloc()'s 1G
              of memory, but only touches 300M of it will show up as
              using 1G. This 1G is memory which has been "committed" to
              by the VM and can be used at any time by the allocating
              application. With strict overcommit enabled on the system
              (mode 2 in 'vm.overcommit_memory'),allocations which would
              exceed the CommitLimit (detailed above) will not be permitted.
              This is useful if one needs to guarantee that processes will
              not fail due to lack of memory once that memory has been
              successfully allocated.
VmallocTotal: total size of vmalloc memory area
 VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contiguous block of vmalloc area which is free

2. kernel-doc-2.6.32/Documentation/vm/overcommit-accounting
The Linux kernel supports the following overcommit handling modes

0       -       Heuristic overcommit handling. Obvious overcommits of
                address space are refused. Used for a typical system. It
                ensures a seriously wild allocation fails while allowing
                overcommit to reduce swap usage.  root is allowed to 
                allocate slighly more memory in this mode. This is the 
                default.

1       -       Always overcommit. Appropriate for some scientific
                applications.

2       -       Don't overcommit. The total address space commit
                for the system is not permitted to exceed swap + a
                configurable amount (default is 50%) of physical RAM.
                Depending on the amount you use, in most situations
                this means a process will not be killed while accessing
                pages but will receive errors on memory allocation as
                appropriate.

The overcommit policy is set via the sysctl `vm.overcommit_memory'.

The overcommit amount can be set via `vm.overcommit_ratio' (percentage)
or `vm.overcommit_kbytes' (absolute value).

The current overcommit limit and amount committed are viewable in
/proc/meminfo as CommitLimit and Committed_AS respectively.

Gotchas
-------

The C language stack growth does an implicit mremap. If you want absolute
guarantees and run close to the edge you MUST mmap your stack for the 
largest size you think you will need. For typical stack usage this does
not matter much but it's a corner case if you really really care

In mode 2 the MAP_NORESERVE flag is ignored. 


How It Works
------------

The overcommit is based on the following rules

For a file backed map
        SHARED or READ-only     -       0 cost (the file is the map not swap)
        PRIVATE WRITABLE        -       size of mapping per instance

For an anonymous or /dev/zero map
        SHARED                  -       size of mapping
        PRIVATE READ-only       -       0 cost (but of little use)
        PRIVATE WRITABLE        -       size of mapping per instance

Additional accounting
        Pages made writable copies by mmap
        shmfs memory drawn from the same pool

Status
------

o       We account mmap memory mappings
o       We account mprotect changes in commit
o       We account mremap changes in size
o       We account brk
o       We account munmap
o       We report the commit status in /proc
o       Account and check on fork
o       Review stack handling/building on exec
o       SHMfs accounting
o       Implement actual limit enforcement

To Do
-----
o       Account ptrace pages (this is hard)
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
相关文章
|
22天前
|
关系型数据库 分布式数据库 数据库
成都晨云信息技术完成阿里云PolarDB数据库产品生态集成认证
近日,成都晨云信息技术有限责任公司(以下简称晨云信息)与阿里云PolarDB PostgreSQL版数据库产品展开产品集成认证。测试结果表明,晨云信息旗下晨云-站群管理系统(V1.0)与阿里云以下产品:开源云原生数据库PolarDB PostgreSQL版(V11),完全满足产品兼容认证要求,兼容性良好,系统运行稳定。
|
25天前
|
编解码 算法 Java
构建高效的Android应用:内存优化策略详解
随着智能手机在日常生活和工作中的普及,用户对移动应用的性能要求越来越高。特别是对于Android开发者来说,理解并实践内存优化是提升应用程序性能的关键步骤。本文将深入探讨针对Android平台的内存管理机制,并提供一系列实用的内存优化技巧,以帮助开发者减少内存消耗,避免常见的内存泄漏问题,并确保应用的流畅运行。
|
29天前
|
关系型数据库 分布式数据库 数据库
PolarDB常见问题之数据库不能自己减少节点如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
29天前
|
缓存 关系型数据库 分布式数据库
PolarDB常见问题之数据库cpu突然飙高如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
2月前
|
关系型数据库 分布式数据库 数据库
阿里云PolarDB登顶2024中国数据库流行榜:技术实力与开发者影响力
近日,阿里云旗下的自研云原生数据库PolarDB在2024年中国数据库流行度排行榜中夺冠,并刷新了榜单总分纪录,这一成就引起了技术圈的广泛关注。这一成就源于PolarDB在数据库技术上的突破与创新,以及对开发者和用户的实际需求的深入了解体会。那么本文就来分享一下关于数据库流行度排行榜的影响力以及对数据库选型的影响,讨论PolarDB登顶的关键因素,以及PolarDB“三层分离”新版本对开发者使用数据库的影响。
74 3
阿里云PolarDB登顶2024中国数据库流行榜:技术实力与开发者影响力
|
1月前
|
关系型数据库 分布式数据库 数据库
PolarDB PostgreSQL版:Oracle兼容的高性能数据库
PolarDB PostgreSQL版是一款高性能的数据库,具有与Oracle兼容的特性。它采用了分布式架构,可以轻松处理大量的数据,同时还支持多种数据类型和函数,具有高可用性和可扩展性。它还提供了丰富的管理工具和性能优化功能,为企业提供了可靠的数据存储和处理解决方案。PolarDB PostgreSQL版在数据库领域具有很高的竞争力,可以满足各种企业的需求。
|
7天前
|
算法 安全 Java
内存分配与回收策略
内存分配与回收策略
14 0
内存分配与回收策略
|
7天前
|
运维 关系型数据库 分布式数据库
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
4月19日周五,PolarDB开源社区联合科大讯飞共同举办开源数据库技术沙龙,本次沙龙我们邀请了众多数据库领域的专家,期待大家的参与!
「合肥 * 讯飞」4 月 19 日 PolarDB 开源数据库沙龙,报名中!
|
10天前
|
NoSQL 安全 Redis
redis内存限制与淘汰策略
Redis内存管理包括限制和淘汰策略。`maxmemory`配置参数决定内存上限,无设置时64位系统默认不限制,可能导致系统资源耗尽,生产环境建议设定合理值。当内存满时,未设置淘汰策略会导致写入错误。Redis提供8种淘汰策略,如LRU(最近最少使用)和LFU(最不经常使用),以及随机或基于过期时间的删除。需根据数据重要性、访问频率和一致性选择合适策略。
13 0
|
16天前
|
存储 缓存 监控
深入解析linux内存指标:快速定位系统内存问题的有效技巧与实用方法(free、top、ps、vmstat、cachestat、cachetop、sar、swap、动态内存、cgroops、oom)
深入解析linux内存指标:快速定位系统内存问题的有效技巧与实用方法(free、top、ps、vmstat、cachestat、cachetop、sar、swap、动态内存、cgroops、oom)