Open topic with navigation
The safe, remote, and region commit protocols coordinate messages to the journal with transaction commits. When you start a Transaction Engine (TE), the safe commit protocol is enabled by default. NuoDB strongly recommends that you continue to use the safe commit protocol.
Note: The commit protocol is set using the
--commit option. If you have changed the default commit protocol and would like to revert to using the safe commit protocol, restart the TE(s) using
--commit>safe<. Alternatively, open the database's capture file to change the
--commit option to
>safe< then restart the database.
Note: The capture file omits configuration options whose value is not the default. If you have never defined a commit protocol, the capture file does not define it.
For commits that insert, delete or update data, the safe commit protocol works as follows:
Note: If a storage group goes offline, an ongoing transaction that modifies it is not be resolved until that storage group comes back online (or is deleted). If the commit fails after all modified storage groups come back online, the following error message is returned:
Transaction NNNNN failed because storage groups X, Y, Z went offline during commit.
Note: Even though a database uses the safe commit protocol, transactions may still be unsuccessful. As with all commit protocols, acknowledgment of the pre-commit indicates that the transaction is visible to other clients. With safe commit, this acknowledgment also guarantees that the transaction commit is durable.
Failure of a database process (TE or SM) is typically transient in that it can be restarted. For example, if a SM fails (perhaps due to power failure), you can usually resolve the failure (restore power) and restart the SM. In rare cases, storage media can suffer permanent failure that prevents the SM from being restarted. This results in the permanent loss of an archive or journal. A permanently failed database process cannot be restarted and so cannot resume serving the database. In the event that a database process has permanently failed, the solution is to replace the failed process with a running database process. For example replacing a failed SM with a running SM.
An archive that permanently fails is no longer available. An unavailable archive may prevent a database being able to enforce durability. This can happen if you need to perform a cold restart of the database and the missing, failed archive is the only archive that contains one or more updates.
Note: Where a database process fails and nodes in the domain can no longer communicate with each other, this may lead to a network partition condition. For information on network partition, see Two Machine Minimally Redundant and High Availability (HA).
Consider the following database configuration:
--commitoption is set to
--max-lost-archivesoption remains set to the default value of
In this scenario, there is a permanent storage loss but a cold restartFor a given database, all processes are shut down before the database is restarted. is not needed. The database continues to run and durability is not violated.
max-lost-archivesis set to
0only one SM needs to be running for insert, delete or update transactions t o be committed. At this time, only SM2 has T2.
In this scenario, a sequence of failures (perhaps due to a power outage) has occurred. The permanent failure of SM2 has resulted in the loss of T2, therefore a cold restart is required. Durability is violated because T2 was committed on only one archive (SM2) and that archive was permanently lost before another archive could synchronize with it.
The details of this scenario are the same as for scenario 2 but the
--max-lost-archivesoption has been set to
1 instead of
--max-lost-archivesoption is set to
1, two SMs must be running in order to commit a transaction that updates, inserts or deletes data.
In this scenario, durability is not violated. No transactions were committed when there was only one SM running. A cold restart was required and durability was maintained.
With the safe commit protocol, you can add or remove SMs without changing the database's configuration, and durability continues to be guaranteed. The safe commit protocol always requires a commit acknowledgment from each running SM.
remote:n commit protocol, you can replace
n with the number of the database's storage managers, for example
remote:2, and achieve the same durability guarantee as the safe commit protocol. You then have the same durability guarantee as if the database was using the safe commit protocol. To continue to enforce the durability guarantee after you add a storage manager, do the following:
--commitoption set to
While the safe commit protocol helps avoid data inconsistencies, for example errors that manifest as the "null descriptor" failure, it does not correct inconsistencies already written to atoms in the archive. Therefore, if a database was running with a non-safe commit mode, and a corruption was introduced into the archive, switching to safe commit does not resolve the issue. To resolve the issue, you must first run NuoDB Archive's nuoarchive check command, using the
--repair option to locate and repair the data inconsistency issues, such as missing descriptors. For more information on NuoDB Archive, see Command Line Tools.