Does LGWR use synchronous IO even AIO enabled?

简介:
在Oracle中开启AIO异步IO后可以一定程度上提升数据库IO性能,但同时也引入了丢失commit数据的风险。具体可以 参见小荷同学的文章 ,但是这里存在一个疑问就是redo重做记录是同步写出还是在启用AIO后同样使用异步IO的API,我们可以通过跟踪lgwr后台进程的system call系统调用来说明该问题:
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production


SQL> show parameter disk_asynch_io

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE

SQL> show parameter filesystem

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL

[oracle@rh2 ~]$ ps -ef|grep dbw0_G10R2|grep -v grep
oracle   29168     1  0 19:02 ?        00:00:01 ora_dbw0_G10R2

[oracle@rh2 ~]$ strace -p 29168

.............................
io_submit(140140183375872, 34, {{0x7f74ee284e10, 0, 1, 0, 16},
{0x7f74ee290920, 0, 1, 0, 16}, {0x7f74ee286970, 0, 1, 0, 16}, {0x7f74ee290db0, 0, 1, 0, 16}  = 34

io_getevents(140140183375872, 1, 1024, {{0x7f74ee284e10, 0x7f74ee284e10, 8192, 0},
0x7f74ee289710, 8192, 0}}, {600, 0}) = 12

times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
io_getevents(140140183375872, 1, 1024, {{0x7f74ee28c4b0, 0x7f74ee28c4b0, 8192, 0},
{0x7f74ee287290, 0x7f74ee287290, 8192, 0}, {0x7f74ee283988, 0x7f74ee283988, 8192, 0},
{0x7f74ee27fbf0, 0x7f74ee27fbf0, 8192, 0}, {0x7f74ee28a030, 0x7f74ee28a030, 8192, 0}, {0x7f74ee28fdb8, = 22

/* 此处dbwr使用io_submit函数将I/O请求入列,
    io_submit是典型的asynchronous I/O system calls,
    可见Oracle针对数据文件写出已在使用异步IO */

[oracle@rh2 ~]$ ps -ef|grep lgwr_G10R2|grep -v grep
oracle   29170     1  0 19:02 ?        00:00:01 ora_lgwr_G10R2

[oracle@rh2 ~]$ strace -p 29170
.............................
io_submit(139932588023808, 2, {{0x7f4497f423c8, 0, 1, 0, 20}, {0x7f4497f42590, 0, 1, 0, 21}}) = 2

io_getevents(139932588023808, 1, 1024, {{0x7f4497f423c8, 0x7f4497f423c8, 3584, 0}}, {600, 0}) = 1
times(NULL)                             = 480533371
io_getevents(139932588023808, 1, 1023, {{0x7f4497f42590, 0x7f4497f42590, 3584, 0}}, {600, 0}) = 1
以上io_submit system call说明当数据库启用AIO后lgwr同样使用异步IO写出重做记录到online logfile,换而言之当存储crash时的确可能出现redo记录丢失而造成的记录丢失情况。 实际控制lgwr是否使用异步IO的是一个隐藏参数_lgwr_async_io,该参数一般默认为false:
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm='_lgwr_async_io';

NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
DESCRIB
--------------------------------------------------------------------------------
_lgwr_async_io
FALSE
LGWR Asynchronous IO enabling boolean flag
你可能要这样问:"既然该_lgwr_async_io参数默认为false,那么lgwr就应当使用同步IO而非async,这岂不矛盾?" 是的,理论上来说默认_lgwr_async_io参数为false,lgwr应当使用sync。但是在版本10.1.0.2到11.1.0.6存在一个“Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE”:
Abstract: LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE PROBLEM: -------- + Lgwr uses asynch IO irrespective of the setting _lgwr_async_io=false and consumes high CPU performing IO poll operations. + short stack and truss output show that the lgwr is waiting for asynch IO completion notification. This particular problem was dormant for a long time and highly affects the database performance. DIAGNOSTIC ANALYSIS: -------------------- LGWR Shortstack ~~~~~~~~~~~~~~~ aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k sfdrwat 0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp() aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k sfdrwat 0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp() aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k sfdrwat 0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp() Truss output ~~~~~~~~~~~~~ = 1048576 67072 27640/38: kaio(AIONOTIFY, 0) = 0 27640/40: kaio(AIONOTIFY, 0) = 0 27640/43: kaio(AIONOTIFY, 0) = 0 27640/45: kaio(AIONOTIFY, 0) = 0 27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1 27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1 27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1 27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1 = 1048576 27640/42: kaio(AIONOTIFY, 0) = 0 27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1 Note : The TRUSS output was taken with the filesystemio_options=setall. 1. Performed the test on solaris Box with the below settings : filesystemio_options=none disk_asynch_io=false _lgwr_async_io=TRUE 2. Performed another test on Linux with the following settings and it looks like the behaviour matches with Solaris and hence it looks like it may not be port specific . filesystemio_options=SETALL disk_asynch_io=TRUE _lgwr_async_io=TRUE/FALSE Linux_lgwr_true.out ) . Request you to review the uploaded files and update with the results of your analysis . As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings . Solution Most often high CPU consumption by lgwr is related to Async IO. You can try setting filesystemio_options=none & disk_asynch_io=false with obvious performance impact , but this may not be acceptable to you . As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings . 1] Move the Redologs to the filesystem where the async IO is not permitted . This will cause the lgwr process not to use async calls. OR 2] Use direct I/O, which is best done by mounting the corresponding filesystems unbuffered ( (using the"sync" option in Linux ext3, or the "forcedirectio" option in Solaris ufs, for example). For example : ========= On solaris: --------------- # mount_ufs -o forcedirectio /dev/rdsk/c2t3d0s0 /d17 If forcedirectio is specified and supported by the file system, then for the duration of the mount forced direct I/O will be used. If the filesystem is mounted using forcedirectio, then data is transferred directly between user address space and the disk. If the filesystem is mounted using noforcedirectio, then data is buffered in kernel address space when data is transferred between user address space and the disk. forcedirectio is a performance option that benefits only from large sequential data transfers. The default behavior is noforcedirectio. On linux : -------------- # mount -o sync /dev/rdsk/c2t3d0s0 /d17 The sync option has effect only for ext2, ext3, fat, vfat and ufs . The default is async . PS: We recommend you to involve your system administrator in chainging the moutn options of the OS file system .
由于Bug:8357698的存在,导致filesystemio_options和disk_asynch_io 2个参数可以覆盖_lgwr_async_io的设置,这个问题一直到11.1.0.6以后才得到修复,而在11.2中_lgwr_async_io参数干脆被取消了:
SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm='_lgwr_async_io';

no rows selected

SQL> show parameter filesystem

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      setall
SQL> show parameter async

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE
tape_asynch_io                       boolean     TRUE

io_submit(139755827974144, 2, {{0x7f1b6e992450, 0, 1, 0, 256}, {0x7f1b6e9921f8, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e992450, 0x7f1b6e992450, 1024, 0}, {0x7f1b6e9921f8, 0x7f1b6e9921f8, 1024, 0}}, {600, 0}) = 2
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863691
semtimedop(2654212, 0x7fffadc3db80, 1, {3, 0}) = 0
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
io_submit(139755827974144, 2, {{0x7f1b6e9921f8, 0, 1, 0, 256}, {0x7f1b6e992450, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e9921f8, 0x7f1b6e9921f8, 512, 0}, {0x7f1b6e992450, 0x7f1b6e992450, 512, 0}}, {600, 0}) = 2

/* 在11.2中若启用了AIO,那么lgwr仍会沿用async IO */

总结: lgwr是否启用async IO取决于_lgwr_async_io隐藏参数,该参数默认为false,也就是说lgwr默认情况下应当使用sync同步IO。但是因为10.1.0.2到11.1.0.6间存在bug会导致_lgwr_async_io被disk_asynch_io和filesystemio_options参数覆盖而不起作用;在以上版本中若启用了AIO那么lgwr无疑会使用AIO,如果既想要启用AIO又不想因为lgwr使用async而造成隐患,那么可以将日志文件所在移动到不允许async IO的位置,或者使用direct/sync/forcedirectio选项来mount文件系统。在版本11.2以后_lgwr_async_io参数被废弃,lgwr是否启用async io完全取决于disk_asynch_io及filesystemio_options。


本文转自maclean_007 51CTO博客,原文链接:http://blog.51cto.com/maclean/1277797

相关文章
|
7月前
|
监控 网络协议 Java
Java 中 IO 之 BIO、NIO 和 AIO
IO 是 Input 和 Output 二词的缩写,意为输入和输出,直接来说,实现一般的 I/O 是没有什么难度的,但涉及到多线程时,要解决 I/O 的问题就不是一个简单的事情了,会涉及到同步和异步的问题,阻塞和非阻塞的问题。了解了(非)同步和(非)阻塞之后,我们再来看 I/O,根据是否同步和是否阻塞以及按它们出现的时间顺序,主要划分为 3 种 I/O 技术,分别是 BIO、NIO 和 AIO。当然,并不是只有这几种,还有其他的 I/O 类型。
63 3
|
缓存 网络协议 Unix
Linux IO模型:阻塞/非阻塞/IO复用 同步/异步 Select/Epoll/AIO
IO概念 Linux的内核将所有外部设备都可以看做一个文件来操作。那么我们对与外部设备的操作都可以看做对文件进行操作。我们对一个文件的读写,都通过调用内核提供的系统调用;内核给我们返回一个file descriptor(fd,文件描述符)。
2680 0
|
8月前
|
网络协议 Java API
JAVA IO模式 —— BIO、NIO、AIO
JAVA IO模式 —— BIO、NIO、AIO
136 0
|
10月前
|
消息中间件 存储 网络协议
Linux五种I/O模式 NIO BIO AIO IO多路复用 信号驱动 I/O
Linux五种I/O模式 NIO BIO AIO IO多路复用 信号驱动 I/O
142 0
|
NoSQL 搜索推荐 网络协议
Java NIO、BIO、 AIO 与 同步、阻塞、非阻塞、异步IO 简析
我相信大部分人看到这些名词,都是一头雾水的,如果你去搜索引擎搜索,那么恭喜你,你又会被各种文章中的高大上的名词搞得云里雾里。那么,我们应该怎么理清这么名词之间的关系呢? 所谓 同步/异步/阻塞/非阻塞 IO ,是指操作系统中的对 IO 处理的不同方法,而 Java 对这些不同操作方法做了一些包装,由此有了 BIO / NIO / AIO 几种操作接口。 我不想复制一些高大上的概念,只是想尽量好好说话,说清楚他们之间的关系。 需求 有 A、B、C、D 四个线程可以生产文件,假设他们的返回的文件是一样的,对应我们的服务端 有 E、F、G、H 四个线程在随机时间向服务端上传一个文本,并且要求
|
缓存 Java API
Java - IO通信(BIO & 伪异步IO & NIO & AIO)
Java - IO通信(BIO & 伪异步IO & NIO & AIO)
133 0
Java - IO通信(BIO & 伪异步IO & NIO & AIO)
|
JSON 前端开发 安全
Java网络编程IO模型 --- BIO、NIO、AIO详解
Java网络编程IO模型 --- BIO、NIO、AIO详解
297 0
Java网络编程IO模型 --- BIO、NIO、AIO详解
|
缓存 NoSQL Java
IO回忆录之怎样过目不忘(BIO/NIO/AIO/Netty)
本文有两条主线:一条是学习方法,怎样学过就能记住。另一条是实际项目中的IO处理问题,包括BIO,NIO,AIO,netty。旨在用学习具体知识的具体流程体现学习方法的形成过程。
|
存储 Java Linux
Linux异步IO(AIO)
异步输入/输出 (AIO) 接口允许并行提交许多 I/O 请求,而不会产生每个请求的线程开销。 本文档的目的是解释如何使用 Linux AIO 接口,即函数家族 `io_setup`、`io_submit`、`io_getevents`、`io_destroy`。 目前,AIO 接口最适合直接“O_DIRECT”访问原始块设备,如磁盘、闪存驱动器或存储阵列。(访问裸盘)
844 2
Linux异步IO(AIO)
|
JSON 安全 JavaScript
Java网络编程IO模型 --- BIO、NIO、AIO详解
一文教你搞懂Java网络编程 BIO、NIO、AIO
379 0
Java网络编程IO模型 --- BIO、NIO、AIO详解