Tuning Tips for Journal Performance

The journal is used to record all messages from the Transaction Engines (TEs). The journal can have an impact on performance. NuoDB supplies a choice of configurable journal sync methods, as defined by the database option journal-sync-method. Three values are allowed for this option, kernel, disk, and osync. Each is specifically tuned for its supported operating system and file system.

The default sync method is disk, because it is the safest. The other options (kernel and osync) have much higher performance but also have the potential for data loss if the operating system and file system/hardware are not set up properly.

In the following table, information specified in the Linux column assumes the file system is one that supports fallocate(), such as ext4. Journaling performance is degraded on a file system that does not support fallocate(). If fallocate() is not supported, kernel and disk sync modes are the same and will default to using fsync().

Value of journal-sync-method Option Linux Behavior Windows Behavior

kernel

Uses buffered writes on a pre-allocated file (journal-max-file-size-bytes) and triggers a flush (sync_file_range()) of the kernel page buffer when a commit message is written to the journal file. The kernel sync method is not safe except when used with a battery backed disk controller.

kernel mode cannot be used with ZFS as sync_file_range() does not work with copy-on-write file systems.osync should be used instead with ZFS.

Uses the FILE_FLAG_WRITE_THROUGH and FILE_FLAG_RANDOM_ACCESS flag when creating the journal file (CreateFile()).
For performance reasons, the journal file is not pre-allocated when using the kernel option. Windows provides a device write cache policy option and for better performance, disk write caching should be enabled. The kernel option is safe only with a battery backed disk controller.

disk

Uses buffered writes on a pre-allocated file (journal-max-file-size-bytes) and triggers a flush (fdatasync()) of the kernel pages and disk cache when a commit message is written to the journal file. This option uses fallocate() to pre-allocate the journal file.

Uses buffered writes on a pre-allocated file (journal-max-file-size-bytes) and triggers a flush (FlushFileBuffers()) of the kernel pages and disk cache when a commit message is written to the journal file. Pre-allocation is performed using SetFilePointerEx() followed by SetEndOfFile(). In addition, the FILE_FLAG_RANDOM_ACCESS flag is used in CreateFile() to give a hint to the kernel page cache that it should not expect sequential reads. That is, do not keep these pages around in cache. Windows provides a device write cache policy option and for better performance, disk write caching should be enabled.

osync

Uses the O_SYNC flag when creating the journal file. osync is not safe unless the write cache for the disk is disabled. The file is not pre-allocated. The osync method is meant to be used with copy-on-write file systems such as ZFS.

Uses the FILE_FLAG_WRITE_THROUGH and FILE_FLAG_RANDOM_ACCESS flag when creating the journal file (CreateFile()). The file is not pre-allocated. Windows provides a device write cache policy option and in order to guarantee durability when using osync, the disk write caching must be disabled.