Disaster Recovery Using Handoff - Example

This section demonstrates disaster recovery using the handoff procedure and the handoff database command.

The example uses the following configurations:

  • There is one active and one passive data center.

  • The active data center has one host (host1) running an Admin Process (AP), a Storage Manager (SM), and a Transaction Engine (TE).

  • The passive data center has one host (host2) running an AP and an Asynchronous Storage Manager (ASM).

  • NuoDB RPM server package is installed on both host1 and host2. For the NuoDB TAR server package installations, adjust the pathnames that are referenced in the command samples on this page accordingly.

  • The NuoDB distribution is in a directory named /opt/nuodb on each host and archives will be stored locally on each host in a directory named /data.

In this example, we simulate a disaster on the database and then perform the handoff procedure for database recovery.

NuoDB Enterprise license is required to support database configurations with multiple SMs. For more information, see Obtaining and Installing an Enterprise Edition License.

Set Up the Test

  1. Set up two APs named active and passive on hosts named host1 and host2.

    In a production database, do not use active and passive for AP server ids since the name cannot be changed when the passive data center becomes active after disaster recovery.
  2. Remove any previous states on host1 and host2:

    rm -f /var/opt/nuodb/raftlog
    rm -rf /data/archive1
    rm -rf /data/observer1
  3. Create configuration files for the active and passive APs.

    Make the following two changes to the nuoadmin.conf properties on the hosts:

    Host Default code Replace with

    host1

    1. "initialMembership": {
      "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
      },

    2. "ThisServerId": "nuoadmin-0",

    1. "initialMembership": {
      "active": { "transport": "host1:48005", "version": "0:10000" },
      "passive": { "transport": "host2:48005", "version": "0:10000" }
      },

    2. "ThisServerId": "active",

    host2

    1. "initialMembership": {
      "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
      },

    2. "ThisServerId": "nuoadmin-0",

    1. "initialMembership": {
      "active": { "transport": "host1:48005", "version": "0:10000" },
      "passive": { "transport": "host2:48005", "version": "0:10000" }
      },

    2. "ThisServerId": "passive",

    For more information on initialMembership and ThisServerId properties, see Setting nuoadmin.conf Properties.

  4. Set environment variables NUODB_HOME, NUOCMD_CLIENT_KEY, and NUOCMD_VERIFY_SERVER appropriately on each host.

    See Enabling TLS Encryption for instructions.

  5. Execute nuoadmin start --conf command on the host1 and host2:

    /opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.active
     * Starting NuoDB Admin
    /opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.passive
     * Starting NuoDB Admin

    Run nuocmd show domain on either host to verify that both APs are running and connected to each other.

    nuocmd show domain
    server version: 5.0-2-a8628647c3, server license: Enterprise
    server time: 2021-01-11T16:08:23.899, client token: ...
    Servers:
      [active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected
      [passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected *
    Databases:
  6. Create a database on the active host.

    nuocmd create archive --db-name DB_NAME --server-id active --archive-path /data/archive1
    Archive(archive_path=/data/archive1, db_name=DB_NAME, id=0, server_id=active, state=PROVISIONED)
    nuocmd create database --db-name DB_NAME --dba-user cloud --dba-password user --te-server-ids active
    STARTING: StartProcessRequest(archive_id=0, db_name=DB_NAME, engine_type=SM, labels={}, options={}, server_id=active)
    STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, labels={}, options={}, server_id=active)
  7. Add an Asynchronous Storage Manager (ASM) on the passive host:

    nuocmd create archive --db-name DB_NAME --server-id passive --archive-path /data/observer1 --passive
    Archive(archive_path=/data/observer1, db_name=DB_NAME, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)
    nuocmd start database --db DB_NAME --incremental
    STARTING: StartProcessRequest(archive_id=1, db_name=DB_NAME, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive)
  8. Run nuocmd show domain on either host to verify that two APs, two SMs, and one TE are running.

    nuocmd show domain
    server version: 5.0-2-a8628647c3, server license: Enterprise
    server time: 2021-01-11T16:09:27.460, client token: ...
    Servers:
      [active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected
      [passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected *
    Databases:
      DB_NAME [state = RUNNING]
        [SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING
        [TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack =  4.43] MONITORED:RUNNING
        [SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack =  3.98] MONITORED:RUNNING
  9. Execute SQL statements on the database using nuosql:

    create table TABLE_NAME (n int, s string);
    insert into TABLE_NAME values(1, 'one');
    select * from TABLE_NAME;
     N   S
     -- ---
    
     1  one

Simulate Failure

Simulate a disaster by using kill -9 to kill the AP, SM, and TE processes on the active host (host1).

For demo purposes, this example does not use --ping-timeout so the SM on the passive host (host2) remains up. The AP on the passive host also remains up. But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:

nuocmd shutdown database --db DB_NAME
'shutdown database' failed: Unable to request database shutdown for dbName=DB_NAME: Unable to get command response: Command request timed out

So, use kill -9 to kill the AP and SM processes on the passive host.

Recover the Admin Process (AP)

Restart the AP following instructions in Re-establishing Admin Process (AP) Quorum.

The commands in this section are executed on host2.
/opt/nuodb/etc/nuoadmin restart --evicted-servers active --conf /opt/nuodb/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
 * Starting NuoDB Admin

To ensure that the admin knows that all engines in the active data center are gone:

nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 5.0-2-a8628647c3, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
  [active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
  [passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
  DB_NAME [state = NOT_RUNNING]

Recover the Database

There are two ways to recover the database:

  • Recovery using the handoff procedure

  • Recovery using the handoff database command

Recovery Using the Handoff Procedure

  1. Run the Confirmation step:

    nuocmd start process --db-name DB_NAME --server-id passive --engine-type SM --archive-id 1
    Process(archive_id=1, db_name=DB_NAME, engine_type=SM, ...)
    nuocmd handoff report-timestamp --db-name DB_NAME --archive-ids 1
    ReportTimestamp(commits=0,0,3, epoch=2, leaders=2 1, timestamp=2021-01-08T21:26:20)

    The timestamp is acceptable, so continue with the next step.

  2. Run the Deprovisioning step:

    nuocmd delete server --server-id active
  3. Run the Resolution step:

    nuocmd check database --db-name DB_NAME --check-syncing --num-processes 1 --timeout 60
    nuocmd handoff reset-state --db-name DB_NAME --commits 0 0 3 --leaders 2 1 --epoch 2
    State successfully reset
  4. Run the Promotion step:

    nuocmd set archive --archive-id 1 --active
  5. Run the Reprovisioning step:

    nuocmd start database --db-name DB_NAME --incremental --te-server-ids passive
    STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)

Recovery Using the Handoff Database Command

  1. Run handoff with an --oldest-acceptable earlier than the consistent state:

    nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2000-01-01T01:00:00
    STARTING: SM process on archive 1
    Time of the most recent consistent state: 2021-03-16T14:28:37
    Reset state run successfully
    Successfully handed off database, you may proceed with the next handoff steps

    The nuocmd handoff database command automatically runs the Confirmation, Resolution, and Promotion steps of the handoff procedure

    If the --oldest-acceptable is later than the consistent state, the handoff is aborted.

    nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2100-01-01T01:00:00
    STARTING: SM process on archive 1
    'handoff database' failed: Time of the most recent consistent state 2021-03-16T14:28:37 is earlier than supplied '--oldest-acceptable' 2100-01-01T01:00:00. Aborting handoff

    Shut down the database:

    nuocmd shutdown database --db-name DB_NAME
  2. Run the Deprovisioning step:

    nuocmd delete server --server-id active
  3. Run the Reprovisioning step:

    nuocmd start database --db-name DB_NAME --incremental --te-server-ids passive
    STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)

Verify the Database Contents

Verify the database contents and run more workload using nuosql.

select * from TABLE_NAME;
 N   S
 -- ---

 1  one
insert into TABLE_NAME values(2, 'two');
select * from TABLE_NAME;
 N   S
 -- ---

 1  one
 2  two

Clean Up

nuocmd shutdown database --db-name DB_NAME
nuocmd show domain
server version: 5.0-2-a8628647c3, server license: Enterprise
server time: 2021-01-11T16:12:59.874, client token: ...
Servers:
  [passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
  DB_NAME [state = NOT_RUNNING]