PostgreSQL 在何处真正开始写数据

本文涉及的产品
云原生数据库 PolarDB MySQL 版,Serverless 5000PCU 100GB
简介:

基本关系是:

BackgroundWriterMain 循环中,调用  BgBufferSync()  -->SyncOneBuffer -->FlushBuffer -->smgrwrite

看代码:

复制代码
/*                    
 * Main entry point for bgwriter process                    
 *                    
 * This is invoked from AuxiliaryProcessMain, which has already created the                    
 * basic execution environment, but not enabled signals yet.                    
 */                    
void                    
BackgroundWriterMain(void)                    
{                    
    ……                
    /*                
     * Loop forever                
     */                
    for (;;)                
    {                
        ……            
                    
        /*            
         * Do one cycle of dirty-buffer writing.            
         */            
        can_hibernate = BgBufferSync();            
        ……            
    }                
}                    
复制代码

再看:

复制代码
/*                            
 * BgBufferSync -- Write out some dirty buffers in the pool.                            
 *                            
 * This is called periodically by the background writer process.                            
 *                            
 * Returns true if it's appropriate for the bgwriter process to go into                            
 * low-power hibernation mode.    (This happens if the strategy clock sweep                        
 * has been "lapped" and no buffer allocations have occurred recently,                            
 * or if the bgwriter has been effectively disabled by setting                            
 * bgwriter_lru_maxpages to 0.)                            
 */                            
bool                            
BgBufferSync(void)                            
{                            
    ……                        
    /* Execute the LRU scan */                        
    while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)                        
    {                        
        int    buffer_state = SyncOneBuffer(next_to_clean, true);                
                            
        if (++next_to_clean >= NBuffers)                    
        {                    
            next_to_clean = 0;                
            next_passes++;                
        }                    
        num_to_scan--;                    
                            
        if (buffer_state & BUF_WRITTEN)                    
        {                    
            reusable_buffers++;                
            if (++num_written >= bgwriter_lru_maxpages)                
            {                
                BgWriterStats.m_maxwritten_clean++;            
                break;            
            }                
        }                    
        else if (buffer_state & BUF_REUSABLE)                    
            reusable_buffers++;                
    }                        
    ……                        
}                            
复制代码

再看:

复制代码
/*                        
 * SyncOneBuffer -- process a single buffer during syncing.                        
 *                        
 * If skip_recently_used is true, we don't write currently-pinned buffers, nor                        
 * buffers marked recently used, as these are not replacement candidates.                        
 *                        
 * Returns a bitmask containing the following flag bits:                        
 *    BUF_WRITTEN: we wrote the buffer.                    
 *    BUF_REUSABLE: buffer is available for replacement, ie, it has                    
 *        pin count 0 and usage count 0.                
 *                        
 * (BUF_WRITTEN could be set in error if FlushBuffers finds the buffer clean                        
 * after locking it, but we don't care all that much.)                        
 *                        
 * Note: caller must have done ResourceOwnerEnlargeBuffers.                        
 */                        
static int                        
SyncOneBuffer(int buf_id, bool skip_recently_used)                        
{                        
    volatile BufferDesc *bufHdr = &BufferDescriptors[buf_id];                    
    int            result = 0;        
                        
    /*                    
     * Check whether buffer needs writing.                    
     *                    
     * We can make this check without taking the buffer content lock so long                    
     * as we mark pages dirty in access methods *before* logging changes with                    
     * XLogInsert(): if someone marks the buffer dirty just after our check we                    
     * don't worry because our checkpoint.redo points before log record for                    
     * upcoming changes and so we are not required to write such dirty buffer.                    
     */                    
    LockBufHdr(bufHdr);                    
                        
    if (bufHdr->refcount == 0 && bufHdr->usage_count == 0)                    
        result |= BUF_REUSABLE;                
    else if (skip_recently_used)                    
    {                    
        /* Caller told us not to write recently-used buffers */                
        UnlockBufHdr(bufHdr);                
        return result;                
    }                    
                        
    if (!(bufHdr->flags & BM_VALID) || !(bufHdr->flags & BM_DIRTY))                    
    {                    
        /* It's clean, so nothing to do */                
        UnlockBufHdr(bufHdr);                
        return result;                
    }                    
                        
    /*                    
     * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the                    
     * buffer is clean by the time we've locked it.)                    
     */                    
    PinBuffer_Locked(bufHdr);                    
    LWLockAcquire(bufHdr->content_lock, LW_SHARED);                    
                        
    FlushBuffer(bufHdr, NULL);                    
                        
    LWLockRelease(bufHdr->content_lock);                    
    UnpinBuffer(bufHdr, true);                    
                        
    return result | BUF_WRITTEN;                    
}                        
复制代码

再看:

复制代码
/*                        
 * FlushBuffer                        
 *        Physically write out a shared buffer.                
 *                        
 * NOTE: this actually just passes the buffer contents to the kernel; the                        
 * real write to disk won't happen until the kernel feels like it.  This                        
 * is okay from our point of view since we can redo the changes from WAL.                        
 * However, we will need to force the changes to disk via fsync before                        
 * we can checkpoint WAL.                        
 *                        
 * The caller must hold a pin on the buffer and have share-locked the                        
 * buffer contents.  (Note: a share-lock does not prevent updates of                        
 * hint bits in the buffer, so the page could change while the write                        
 * is in progress, but we assume that that will not invalidate the data                        
 * written.)                        
 *                        
 * If the caller has an smgr reference for the buffer's relation, pass it                        
 * as the second parameter.  If not, pass NULL.  In the latter case, the                        
 * relation will be marked as "transient" so that the corresponding                        
 * kernel-level file descriptors are closed when the current transaction ends,                        
 * if any.                        
 */                        
static void                        
FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)                        
{                        
    XLogRecPtr    recptr;                
    ErrorContextCallback errcontext;                    
    instr_time    io_start,                
                io_time;        
                        
    /*                    
     * Acquire the buffer's io_in_progress lock.  If StartBufferIO returns                    
     * false, then someone else flushed the buffer before we could, so we need                    
     * not do anything.                    
     */                    
    if (!StartBufferIO(buf, false))                    
        return;                
                        
    /* Setup error traceback support for ereport() */                    
    errcontext.callback = shared_buffer_write_error_callback;                    
    errcontext.arg = (void *) buf;                    
    errcontext.previous = error_context_stack;                    
    error_context_stack = &errcontext;                    
                        
    /* Find smgr relation for buffer, and mark it as transient */                    
    if (reln == NULL)                    
    {                    
        reln = smgropen(buf->tag.rnode, InvalidBackendId);                
        smgrsettransient(reln);                
    }                    
                        
    TRACE_POSTGRESQL_BUFFER_FLUSH_START(buf->tag.forkNum,                    
                    buf->tag.blockNum,    
                    reln->smgr_rnode.node.spcNode,    
                    reln->smgr_rnode.node.dbNode,    
                    reln->smgr_rnode.node.relNode);    
                        
    /*                    
     * Force XLOG flush up to buffer's LSN.  This implements the basic WAL                    
     * rule that log updates must hit disk before any of the data-file changes                    
     * they describe do.                    
     */                    
    recptr = BufferGetLSN(buf);                    
    XLogFlush(recptr);                    
                        
    /*                    
     * Now it's safe to write buffer to disk. Note that no one else should                    
     * have been able to write it while we were busy with log flushing because                    
     * we have the io_in_progress lock.                    
     */                    
                        
    /* To check if block content changes while flushing. - vadim 01/17/97 */                    
    LockBufHdr(buf);                    
    buf->flags &= ~BM_JUST_DIRTIED;                    
    UnlockBufHdr(buf);                    
                        
    if (track_io_timing)                    
        INSTR_TIME_SET_CURRENT(io_start);                
                        
    smgrwrite(reln,                    
              buf->tag.forkNum,            
              buf->tag.blockNum,            
              (char *) BufHdrGetBlock(buf),            
              false);            
                        
    if (track_io_timing)                    
    {                    
        INSTR_TIME_SET_CURRENT(io_time);                
        INSTR_TIME_SUBTRACT(io_time, io_start);                
        pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));                
        INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time);                
    }                    
                        
    pgBufferUsage.shared_blks_written++;                    
                        
    /*                    
     * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and                    
     * end the io_in_progress state.                    
     */                    
    TerminateBufferIO(buf, true, 0);                    
                        
    TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(buf->tag.forkNum,                    
                           buf->tag.blockNum,
                           reln->smgr_rnode.node.spcNode,
                           reln->smgr_rnode.node.dbNode,
                           reln->smgr_rnode.node.relNode);
                        
    /* Pop the error context stack */                    
    error_context_stack = errcontext.previous;                    
}                        
复制代码

循环里面一次写一个 buffer哇, 怪异否? 也许是有一点就写一点,设计者是故意的?






本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/archive/2012/10/24/2737470.html,如需转载请自行联系原作者


相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1月前
|
存储 关系型数据库 分布式数据库
PolarDB常见问题之PolarDB冷存数据到OSS之后恢复失败如何解决
PolarDB是阿里云推出的下一代关系型数据库,具有高性能、高可用性和弹性伸缩能力,适用于大规模数据处理场景。本汇总囊括了PolarDB使用中用户可能遭遇的一系列常见问题及解答,旨在为数据库管理员和开发者提供全面的问题指导,确保数据库平稳运行和优化使用体验。
|
1月前
|
SQL 关系型数据库 分布式数据库
在PolarDB中,行数评估是通过对表的统计数据、基数估计以及算子代价模型来进行估算的。
【2月更文挑战第14天】在PolarDB中,行数评估是通过对表的统计数据、基数估计以及算子代价模型来进行估算的。
84 1
|
6月前
|
关系型数据库 MySQL Linux
TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据
使用PostgreSQL数据迁移神器pgloader从TiDB迁移数据到PostgreSQL,同时说明如何在最新的Rocky Linux 9(CentOS 9 stream也适用)上通过源码编译安装pgloader。
228 0
|
6天前
|
SQL 关系型数据库 MySQL
关系型数据库插入数据的语句
使用SQL的`INSERT INTO`语句向关系型数据库的`students`表插入数据。例如,插入一个`id`为1,`name`为&#39;张三&#39;,`age`为20的记录:`INSERT INTO students (id, name, age) VALUES (1, &#39;张三&#39;, 20)。如果`id`自增,则可简化为`INSERT INTO students (name, age) VALUES (&#39;张三&#39;, 20)`。
5 2
|
6天前
|
SQL 存储 Oracle
关系型数据库查询数据的语句
本文介绍了关系型数据库中的基本SQL查询语句,包括选择所有或特定列、带条件查询、排序、分组、过滤分组、表连接、限制记录数及子查询。SQL还支持窗口函数、存储过程等高级功能,是高效管理数据库的关键。建议深入学习SQL及相应数据库系统文档。
8 2
|
13天前
|
人工智能 Cloud Native 算法
数据之势丨AI时代,云原生数据库的最新发展趋势与进展
AI与云数据库的深度结合是数据库发展的必然趋势,基于AI能力的加持,云数据库未来可以实现更快速的查询和决策,帮助企业更好地利用海量数据进行业务创新和决策优化。
数据之势丨AI时代,云原生数据库的最新发展趋势与进展
|
30天前
|
关系型数据库 MySQL OLAP
PolarDB +AnalyticDB Zero-ETL :免费同步数据到ADB,享受数据流通新体验
Zero-ETL是阿里云瑶池数据库提供的服务,旨在简化传统ETL流程的复杂性和成本,提高数据实时性。降低数据同步成本,允许用户快速在AnalyticDB中对PolarDB数据进行分析,降低了30%的数据接入成本,提升了60%的建仓效率。 Zero-ETL特性包括免费的PolarDB MySQL联邦分析和PolarDB-X元数据自动同步,提供一体化的事务处理和数据分析,并能整合多个数据源。用户只需简单配置即可实现数据同步和实时分析。
|
6月前
|
关系型数据库 数据管理 Go
《PostgreSQL数据分区:原理与实战》
《PostgreSQL数据分区:原理与实战》
85 0
|
2月前
|
关系型数据库 分布式数据库 PolarDB
电子书阅读分享《PolarDB开发者大会:PolarDB在线数据实时分析加速》
电子书阅读分享《PolarDB开发者大会:PolarDB在线数据实时分析加速》
85 3
|
2月前
|
关系型数据库 分布式数据库 PolarDB
电子书阅读分享《PolarDB开发者大会:PolarDB在线数据实时分析加速》
电子书阅读分享《PolarDB开发者大会:PolarDB在线数据实时分析加速》
76 1