Setting Up Failure Detection

A NuoDB database consists of a set of processes (Storage Managers (SMs) and Transaction Engines TEs)) that run on hosts in a NuoDB domain. Since a host may have multiple concurrent NuoDB databases running on it, database failure detection is handled at the process level (SM or TE) and not at the host level.

All processes in a database are peered to one another and keep a record of all the other processes active in the database. Failure detection is triggered by network disconnections and network connection timeouts between the peered processes. When network communication with a process is disconnected or times out, the process is evicted from the database by its peers. The processes monitor whether all other processes in the database are active. Each process determines whether it, together with all the other processes with which it communicates, forms part of a majority of processes in the database. If a process determines that it is no longer part of the majority of processes in the database, this failure is detected and resolved by gracefully shutting down the isolated process.

Processes that remain in the database evict non-communicative processes from the database and the isolated processes in turn shut down as they simultaneously determine that they no longer belong to the majority of processes in the database.

Failure detection is not enabled by default.

See the following topics: