Oracle Online Redo Log能否放在Flash闪存卡上？-阿里云开发者社区

Flash 闪存卡的性能远超SAS 盘，所以在数据库中使用广泛。但是online redo log 是否应该存放在闪存卡上一直是有争议的话题。今天由DBA+社群合肥发起人戴明明来谈一谈他通过理论和实际的实验去测试这个问题。

专家简介

戴明明

DBA+社群合肥发起人

Oracle ACE Associate，中国 ORACLE 用户组（ACOUG）核心成员，中国浙江应用中间件与数据库用户组成员。超过7年的DBA经验，在Oracle 高可用性方面有一定的经验积累，擅长Oracle数据库诊断、性能调优，热衷于Oracle 技术的研究与分享。从2014年开始一直在研究基于PCIe 闪存卡的数据库解决方案。

1 Oracle 官方的建议

Alternative and Specialised Options as to How to Avoid Waiting for Redo Log Synchronization (文档 ID 857576.1)

在这篇MOS的文章中，提到如下一句话：

Also putting the SLOG on an SSD (Solid State Disk) will reduce redo log latency further. This will help improve the performance of synchronous writes.

Oracle 建议把redo log 放在SSD上，这样可以减少延时，提升同步写的性能。

Troubleshooting: 'Log file sync' Waits (文档 ID 1376916.1)

在这篇MOS文章中，Oracle 的建议如下。

If the proportion of the 'log file sync' time spent on 'log file parallel write' times is high, then most of the wait time is due to IO (waiting for the redo to be written). The performance of LGWR in terms of IO should be examined. As a rule of thumb, an average time for 'log file parallel write' over 20 milliseconds suggests a problem with IO subsystem.

Recommendations

Work with the system administrator to examine the filesystems where the redologs are located with a view to improving the performance of IO.

Do not place redo logfiles on a RAID configuration which requires the calculation of parity, such as RAID-5 or RAID-6.

Do not put redo logs on Solid State Disk (SSD)

Although generally, Solid State Disks write performance is good on average, they may endure write peaks which will highly increase waits on 'log file sync'.

(Exception to this would be for Engineered Systems (Exadata, SuperCluster and Oracle Database Appliance) which have been optimized to use SSDs for REDO)

Look for other processes that may be writing to that same location and ensure that the disks have sufficient bandwidth to cope with the required capacity. If they don't then move the activity or the redo.

Ensure that the log_buffer is not too big. A very large log_buffer can have an adverse affect as waits will be longer when flushes occur. When the buffer fills up, it has to write all the data into the redo log file and the LGWR will wait until the last I/O is completed.

这里Oracle 不建议把redo log 放在SSD上，但也补充到，Exadata 系统的redo 是存放在SSD上的。

MOS上也提到如下一句：

Although generally, Solid State Disks write performance is good on average, they may endure write peaks which will highly increase waits on 'log file sync'.

Oracle 的意思是说SSD 写性能很好，但是可能某个时刻出现写高峰，从而导致更高的log file sync。注意这里是may，是可能。

Flasn 闪存卡使用的Flash 介质有三种型号：SLC,MLC,TLC。

民用级的SSD 采用的是MLC和TLC，而采用TLC，容量大，因受民用价钱的约束，民用级的SSD, OP值都比较低，一般在10%以内，当满盘写之后，性能会下降，并且写放大系数也会比企业级的SSD高，在这种情况下，确实可能出现oracle 说的may的可能性。

但企业级的PCIE Flash采用的是MLC，OP值可以达到27%，OP值高，写放大系数可以控制的更低，大的OP可以给闪存卡提供更好的性能。所以在这种情况下，不会出现oracle 说的write peaks。

2 4K Online Redo Log

在MOS 中，Troubleshooting: 'Log file sync' Waits (文档 ID 1376916.1)。 Oracle 提到XD 上redo log 是放在SSD盘的。然后有另外一篇MOS文章：

Using 4k Redo Logs on Flash and SSD-based Storage (文档 ID 1681266.1)

2.1扇区大小

现在的存储都支持4k的扇区，而上一代存储多采用512 bytes的扇区。扇区即每次最小IO的大小。

4k 扇区有两种工作模式：native mode 和 emulation mode。

1）Native mode，即4k模式，物理和逻辑的block大小一样，都是4096 bytes。但native mode 的缺点是需要操作系统和软件（如DB）的支持。 Oracle 从11gR2 之后，就支持4k IO操作，操作系统方面， Linux 内核在2.6.32 之后都支持4k IO操作。

2）emulation mode：也称512e。在该模式下，物理块还是4k，但逻辑块是512 bytes。这种模式主要是为了向后兼容。但在该模式下，底层物理还是4k进行操作，所以就会导致Partial I/O 和4k 对齐的问题。

在emulation mode下，每次IO操作大小是512 bytes，底层存储平台的IO操作必须是4k大小，如果要读512 bytes的数据，实际需要读4k，是原来的8倍，这个就是partial IO。另外在512 bytes 写的情况下，实际也是先读4k 的物理block，然后更新其中的512 bytes的数据，在把4k 写回去。所以在emulation mode下，增加的工作会增加延时，降低性能。

在Oracle 数据库的文件中，默认情况下，datafile的block 是8KB，控制文件是16KB，所以都没有partial IO的问题，唯有online redo log，默认是512 bytes，存在partial IO的问题。