Handling Failure

In the normal course of events and in most failure situations, NuoDB runs reliably and consistently and automatically recovers from failure on its own. However, just like any distributed system, the risk exists for a resource in a NuoDB domain to fail in a way that NuoDB cannot automatically fix. As a domain administrator, you might need to recover from, for example, loss of a broker host or loss of a data center, perhaps because of a power outage.

NuoDB provides tools to help identify if there is a failure. To recover from failure, you need to know which domain resources are running and what is not running, not connected or not reachable. With that information, you can determine the tasks you need to perform to resolve the failure. The tasks required vary according to the resource that has been lost and whether or not there is a broker quorum.

As described in About Broker Quorum, a majority of brokers must be running and available when you want to perform certain domain tasks, such as adding a database process or adding a host to the domain. These tasks update the durable domain configuration, which provides domain configuration information that is stored consistently on each broker in the domain by means of a Raft log.

See the following topics:

Determining If Broker Quorum Exists

Handling Unreachable Processes

Removing an Unreachable Broker's Host from the Domain

Re-establishing Broker Quorum

See also: