Setting Up Failure Detection

A NuoDB database consists of a set of processes (storage managers and transaction engines) that run on hosts in a NuoDB domain. Since a host may have multiple concurrent NuoDB databases running on it, database failure detection is handled at the process level (SM or TE) and not at the host level.

All processes in a database are peered to one another and keep a record of all the other processes active in the database. Failure detection is triggered by ping timeouts between the peered processes. When network communication with a process times out, the process is evicted from the database by its peers. The processes monitor whether all other processes in the database are active. Each process determines if it, together with all the other processes with which it communicates, form a majority of processes in the database. If the process determines that it is no longer part of the majority of processes in the database, failure is detected and each isolated process or group of isolated processes gracefully shuts down.

Processes that remain in the database evict the non-communicative processes from the database and the isolated processes in turn shut down as they simultaneously determine that they no longer belong to the majority of processes in the database.

Failure detection is not enabled by default.

See the following topics:

Setting the ping-timeout Database Option

Examples of Failure Scenarios