Directing the Enforcer to an Alternate Host

The enforcer runs in a broker and compares the state of the domain against the requirements for the database . The enforcer starts additional processes as needed. When a host is experiencing a problem that prevents the enforcer from starting new database processes, the enforcer will continue to retry starting the process. This can be managed by the automation retry and back-off policy. This topic describes how to direct the enforcer to give up on a host and use an available, alternate host.

Here is an example of when you might need to direct the enforcer to give up on a host. Suppose you have a three (3) host domain and a minimally redundant database with two transaction engines (TEs) and two storage managers (SMs). On one of the hosts that is currently running a storage manager, there is a disk failure of some sort, perhaps the archive is no longer available for some reason or you no longer have write permission to it. The enforcer will continue to attempt to start the SM over and over again, even though the SM exits immediately. The enforcer will not automatically start an SM on the third available host. The database will remain in an UNMET state and will not be redundant.

On the host, where the SM is incurring errors and the SM retry loop is in effect, update the automation retry and back-off policy, to have some maximum retry value. Set stopOnMaxRetry to false in order to allow migration to a new host later.

nuodb [domain] > update database
Database Name: 
Database Options (optional): mem 500m backoff.reqMinUptime 30000 backoff.maxRetry 3 backoff.delay 10000 backoff.stopOnMaxRetry false
Database Options for SMs (optional): 
Tag Constraints for SMs (optional): 
Database Options for TEs (optional): 
Tag Constraints for TEs (optional): 

At some point, the enforcer gives up on the SM on that host. Note that the SM maximum process scale out requirement considers known reachable or unreachable archive locations in the durable domain configuration and not just running SM processes. This is why you do not see a third SM starting up if SM_MAX=2. Now increment the template variable value SM_MAX=3:

nuodb [domain] > update database
Database Name: test
Database Options (optional): mem 500m backoff.reqMinUptime 30000 backoff.maxRetry 3 backoff.delay 10000 backoff.stopOnMaxRetry false
Database Options for SMs (optional): 
Tag Constraints for SMs (optional): 
Database Options for TEs (optional): 
Tag Constraints for TEs (optional): 

Check that the third SM gets started:

nuodb [domain] > show database config
Database: test
Database: test,(unmanaged), processes [1 TE, 1 SM], ACTIVE , Status=RUNNING, template [Minimally Redundant]
  Variables: {SM_MAX=3, REGION=us-west-2}
  Options: {backoff.reqMinUptime=30000, backoff.stopOnMaxRetry=false, mem=500m, backoff.maxRetry=3, backoff.delay=10000}
  Default Options: { "commit": "${COMMIT:remote:1}","backoff.reqMinUptime":"30000","hostLimit":"${HOST_LIMIT:false}"}
  Process group options:
  Process group tag constraints:
  Archive Locations:
    ip-172-31-14-151/52.10.63.144:48004, requirements: SMs, region: us-west-2:
      archive: /var/opt/nuodb/production-archives/test
      journal-dir: /var/opt/nuodb/production-archives/test
    ec2-54-200-117-181.us-west-2.compute.amazonaws.com/172.31.5.193:48004, requirements: SMs, region: us-west-2:
      archive: /var/opt/nuodb/production-archives/test
      journal-dir: /var/opt/nuodb/production-archives/test
    ip-172-31-2-230/54.148.240.227:48004, requirements: SMs, region: us-west-2:
      archive: /var/opt/nuodb/production-archives/test
      journal-dir: /var/opt/nuodb/production-archives/test
  Minimally Redundant MET 

Once the SM is up, you can change the SM_MAX back to 2. The enforcer does not shut down processes.

nuodb [domain] > update database
Database Name: test Minimally Redundant
Database Options (optional): mem 500m backoff.reqMinUptime 30000 backoff.maxRetry 3 backoff.delay 10000 backoff.stopOnMaxRetry false

Remove the broken archive location from the durable configuration:

nuodb [domain] > remove database archiveLocation 
Database Name: test
Host or AgentId: ip-172-31-14-151/52.10.63.144:48004
Archive Pathname: /var/opt/nuodb/production-archives/test
nuodb [domain] > show database config     
Database: test, (unmanaged), processes [1 TE, 1 SM]ACTIVE, Status=RUNNING 
  Variables: {SM_MAX=2, REGION=us-west-2}
  Options: {backoff.reqMinUptime=30000, backoff.stopOnMaxRetry=false, mem=500m, backoff.maxRetry=3, backoff.delay=10000}
  Process group options:
  Process group tag constraints:
  Archive Locations:
    ec2-54-200-117-181.us-west-2.compute.amazonaws.com/172.31.5.193:48004, requirements: SMs, region: us-west-2:
      archive: /var/opt/nuodb/production-archives/test
      journal-dir: /var/opt/nuodb/production-archives/test
    ip-172-31-2-230/54.148.240.227:48004, requirements: SMs, region: us-west-2:
      archive: /var/opt/nuodb/production-archives/test
      journal-dir: /var/opt/nuodb/production-archives/test