Database Journal

NuoDB’s architecture consists of distributed processes that operate over three layers:

  • A management layer - Admin Processes (APs).

  • A transaction layer - Transaction Engines (TEs).

  • A storage layer - Storage Managers (SMs).

The storage layer performs many tasks, including maintaining a complete copy of the database. When the TEs modify data:

  • The corresponding atoms are updated immediately in the memory of the SM or SMs managing those atoms. NuoDB supports redundancy, so an atom is typically managed by more than one SM.

  • Atom updates are also logged immediately in the Journal.

  • The atom files in the archive are only updated later, after a delay.

This delay mechanism ensures correctness, enhances performance, and enables recovery if the SM process fails. The journal is crucial to its operation.

Journal Usage and Configuration

The Journal (also known as a Transaction Log or Write-Ahead Log) is used to save all messages that update the state of an atom.

The journal is a message buffer in the SM, storing all atom update messages as they are received.

  • When a transaction commits, a commit-record is also written into the journal and the journal is saved to disk to ensure committed data is not lost.

  • The journal may also be written to disk due to a periodic sync to minimise the amount of data waiting to be written.

  • Writing into the journal is faster than writing directly to the archive because the messages are smaller than most atoms.

Messages in the journal are subsequently applied to the database atoms by the SM and persisted to the archive.

  • Various threads in the SM periodically perform this task.

  • Messages for transactions that never committed are not applied to any atoms.

  • At any time there can be changed atoms in the memory of the SM that have not yet been written to disk but are recorded in the journal.

  • Outstanding changes to the same atom can be coalesced together into a single write into the archive, reducing the number of archive writes.

Once change messages from the journal have been written to the archive, they are periodically removed from the journal (a process known as "reaping").

The following database options are available to configure journaling:

  • --journal-dir

  • --journal-max-directory-entries

  • --journal-max-file-size-bytes

  • --journal-single-file

  • --journal-sync-method

For information on using these and other options, see Database Options.

For performance tips, refer to Journal Performance Tips.

How it Works

Journal in normal operation
Figure 1. Journal during normal operation.

The typical sequence of operations in each SM that manages Atom A:

  1. Update Atom A

    1. Receive message: update Atom A

    2. Message copied into journal buffer

    3. Apply changes to A (in memory) and mark A as dirty

  2. …​ some time passes …​ (periodically the journal buffer may be synced to disk)

  3. Commit change

    1. Receive message: commit transaction

    2. Commit message copied into journal buffer

    3. Unsaved messages (including the commit message) appended into journal file

    4. SM acknowledges commit to TE

  4. …​ some time passes …​

  5. A is written to the archive disk and atom in SM is marked clean

  6. …​ some time passes …​

  7. Journal file(s) containing the processed messages deleted from disk

Database Recovery

Journaled messages are important because they are used to recover data lost by an SM that shutdown unexpectedly. When the SM restarts, changes to atoms that were not yet written to disk by the failed SM, are made by replaying the messages in the journal and writing the updates into the archive. Journal entries without a corresponding commit message do not represent a complete transaction and are not made persistent.

By default, the journal is a subdirectory of the archive. Use the --journal-dir database option to configure a different location for the journal.

  • Typically the journal is located on its own volume for performance reasons (using the --journal-dir database option).

  • For best performance, the journal should be on the fastest media available since write throughput typically depends on how fast messages can be written to the journal.

  • When performing backups make sure to backup both the archive and the journal. The backup cannot be restored if the journal is missing.

  • When restoring, restore both the archive and the journal or the SM will not start.

The performance of the journal can be monitored using the Journal Queue metric.

The journal also supports restoring the database to a specific point-in-time by restoring to a specific commit message in the journal. This requires enabling Journal Hot Copy with hot copy backup sets. For more information, see Using Journal Hotcopy.