Commit Protocol

Due to the distributed nature of NuoDB, a form of two-phase commit is required since multiple SMs are typically being asked to commit the same data in parallel.

The following diagram shows the data flow for a write operation.

Figure 1. Complete Write Data Flow

App sends SQL to TE1.
Modified atom diffs sent to SMs, and any TEs (TE2) with the atom cached, as a pre-commit message.
- SMs copy the change into their journal.
TE1 waits for acknowledgements from SMs.
TE1 tells all engines (TE2, SM1, SM2) to commit.
- TE2 updates its cached copy.
- Each SM adds a commit message to their Journal.
- TE1 acknowledges App.

Because the messages in the journal have an associated COMMIT message, the changes will be used to update the archive. If, for any reason, the COMMIT is never received, the changes in the journal are ignored (rolled-back). This is referred to as the Safe Commit Protocol.

Safe Commit Protocol

The safe commit protocol is the default and the only commit protocol available. It guarantees durability and fast performance when journaling is enabled. Durability guarantees that a committed transaction will remain committed after any combination of transient system failures (up to and including all databases processes exiting abnormally). NuoDB also extends Durability to protect committed transactions from permanent storage failures. Committed transactions are guaranteed to remain committed as long as no more than max-lost-archives archives (or their corresponding journals) are permanently lost at any given point in time.

Safe Commit Protocol and Recoverability

The safe commit protocol coordinate messages to the journal with transaction commits. If a Storage Manager (SM) process terminates unexpectedly, journaling messages are used to ensure that the archive is in a consistent state when it rejoins the database.

Replication files that are required for journaling are stored in a specified journal directory; if a directory is not specified, the default directory is journal in the archive directory. For information on using --journal-dir and other journaling-related options, see Database Options.

Journaling is always enabled to ensure data durability. For the purpose of redundancy, at least two SMs should be used.

To optimize performance, the journal and the archive should be located on separate disks and use different disk controllers. If either the journal or the archive is unavailable, or one of them is corrupt, then an SM cannot be started.

For commits that insert, delete, or update data, the safe commit protocol includes the following:

The client requests a commit.
Pre-commit is only sent to Storage Managers (SMs) that are leader candidates for storage groups modified by the transaction.
Each available SM acknowledges the pre-commit after making it durable so it cannot be lost if the SM, or the host it is running on, crashes.
The TE sends a commit message to all nodes.

If a storage group goes offline, an ongoing transaction that modifies it is not resolved until the storage group comes back online (or is deleted). If the commit fails after all modified storage groups come back online, the following error message is returned:
```
Transaction NNNNN failed because storage groups X, Y, Z went offline during commit.
```
Even though a database uses the safe commit protocol, transactions may still be unsuccessful. However, safe commit guarantees that committed transactions are durable.

Durability under Safe Commit

Failure of a database process (TE or SM) is typically transient and the process can be restarted. For example, if an SM fails (perhaps due to power failure), you can usually resolve the failure (restore power) and restart the SM. In rare cases, storage media can suffer permanent failure that prevents the SM from being restarted. This results in the permanent loss of an archive or journal. A permanently failed database process cannot be restarted and, therefore, cannot resume serving the database. In the event that a database process has permanently failed, replace the failed process with a running database process. For example, replace a failed SM with a running SM.

An archive that permanently fails, is no longer available. An unavailable archive may prevent a database being able to enforce durability. This can happen if you need to perform a cold restart of the database and the missing, failed archive is the only archive that contains one or more updates.

Examples of Safe Commit Behavior

Consider the following database configuration:

The --commit option is set to safe.
The --max-lost-archives option remains set to the default value of 0.
One Transaction Engine (TE1) is running.
Two Storage Managers (SM1 and SM2) are running.

Scenario 1 - Permanent Storage Loss without Cold Restart

In this scenario, there is a permanent storage loss but a cold restart is not needed. The database continues to run and durability is not violated.

TE1 commits Transaction 1 (T1).
SM1 crashes but the disk is not lost. SM2 continues running.
TE1 commits Transaction 2 (T2).
This is allowed because when max-lost-archives is set to 0 only one SM needs to be running for insert, delete, or update transactions to be committed. At this time, only SM2 has T2.
SM1 is restarted by the DBA.
SM1 synchronizes with SM2.
At this time, both SM1 and SM2 have T2.
SM2 crashes and cannot be restarted.
SM3 is started by the DBA.
SM3 synchronizes with SM1.
At this time, both SM1 and SM2 have T2.

Scenario 2 - Permanent Storage Loss with Cold Restart

In this scenario, a sequence of failures (perhaps due to a power outage) has occurred. The permanent failure of SM2 has resulted in the loss of T2, therefore a cold restart is required. Durability is violated because T2 was committed on only one archive (SM2) and that archive was permanently lost before another archive could synchronize with it.

TE1 commits T1.
SM1 crashes but the disk is not lost. SM2 continues running.
TE1 commits transaction T2 which modifies the database.
At this time, only SM2 has T2.
SM2 crashes and cannot be restarted.
As the only TE has failed, the database is down.
SM1 is restarted by the DBA.
SM1’s archive defines the database.
A TE is started by the DBA.
T2 is lost.

Scenario 3 - Permanent Storage Loss with Cold Restart (and max-lost-archives set to 1)

The details of this scenario are the same as for scenario 2 but the --max-lost-archives option has been set to 1 instead of 0:

TE1 commits T1.
SM1 crashes but the disk is not lost. SM2 continues running.
TE1 attempts to commit T2, which would modify the database.
NuoDB does not allow this commit because when the --max-lost-archives option is set to 1, two SMs must be running in order to commit a transaction that updates, inserts or deletes data.
SM2 crashes and cannot be restarted.
As the only TE has failed, the database is down.
SM1 is restarted by the DBA.
This archive defines the database.
A TE is started by the DBA.
No transactions are lost.

In this scenario, durability is not violated. No transactions were committed when there was only one SM running. A cold restart was required and durability was maintained.