Using the Handoff Database Command Example

This example mirrors the previous example except uses the handoff database command to assist with handoff.

The first step is to set up two APs named active and passive on hosts named host1 and host2 (substitute your own host names.) In real life you would not use active and passive for AP server ids since you can’t change the name when the passive data center becomes active after disaster recovery.

Make sure to remove any previous state on both hosts so you start with clean machines:

rm -f dist/var/opt/raftlog
rm -rf /data/archive1
rm -rf /data/observer1

Create configuration files for the active and passive admin processes. See Extending the Database Across Multiple Hosts (Scaling Out) for instructions.

The nuoadmin.conf.active configuration file is nuoadmin.conf with these changes:

-  "ThisServerId": "nuoadmin-0",
+  "ThisServerId": "active",

   "initialMembership": {
-    "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
+    "active": { "transport": "host1:48005", "version": "0:10000" },
+    "passive": { "transport": "host2:48005", "version": "0:10000" }
   },

The nuoadmin.conf.passive configuration file is nuoadmin.conf with these changes:

-  "ThisServerId": "nuoadmin-0",
+  "ThisServerId": "passive",

   "initialMembership": {
-    "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
+    "active": { "transport": "host1:48005", "version": "0:10000" },
+    "passive": { "transport": "host2:48005", "version": "0:10000" }
   },

Set environment variables NUODB_HOME, NUOCMD_CLIENT_KEY, and NUOCMD_VERIFY_SERVER appropriately on each host. See Enabling TLS Encryption for instructions.

Execute this command on the active host:

dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.active
 * Starting NuoDB Admin

and this command on the passive host:

dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.passive
 * Starting NuoDB Admin

There might be false timeout errors. You can do nuocmd show domain on either host to verify that both APs are running and connected to each other.

nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:08:23.899, client token: ...
Servers:
  [active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected
  [passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected *
Databases:

The next step is to create a database on the active host:

nuocmd create archive --db-name test --server-id active --archive-path /data/archive1
Archive(archive_path=/data/archive1, db_name=test, id=0, server_id=active, state=PROVISIONED)
nuocmd create database --db-name test --dba-user cloud --dba-password user --te-server-ids active
STARTING: StartProcessRequest(archive_id=0, db_name=test, engine_type=SM, labels={}, options={}, server_id=active)
STARTING: StartProcessRequest(db_name=test, engine_type=TE, labels={}, options={}, server_id=active)

then add an asynchronous storage manager on the passive host:

nuocmd create archive --db-name test --server-id passive --archive-path /data/observer1 --passive
Archive(archive_path=/data/observer1, db_name=test, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)
nuocmd start database --db test --incremental
STARTING: StartProcessRequest(archive_id=1, db_name=test, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive)

You can use nuocmd show domain on either host to verify that two APs, two SMs, and one TE are running.

nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:09:27.460, client token: ...
Servers:
  [active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected
  [passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected *
Databases:
  test [state = RUNNING]
    [SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING
    [TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack =  4.43] MONITORED:RUNNING
    [SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack =  3.98] MONITORED:RUNNING

Do some minimal work in the database using nuosql:

create table test (n int, s string);
insert into test values(1, 'one');
select * from test;
 N   S
 -- ---

 1  one

Now simulate failure by using kill -9 to kill the AP, SM, and TE processes on the active host.

For demo purposes this example does not use --ping-timeout so the SM on the passive host remains up. The AP on the passive host also remains up. But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:

nuocmd shutdown database --db test
'shutdown database' failed: Unable to request database shutdown for dbName=test: Unable to get command response: Command request timed out

so use kill -9 to kill the AP and SM processes on the passive host.

Having simulated a disaster, start recovery. Restart the AP following instructions in Re-establishing Admin Process Quorum

dist/etc/nuoadmin restart --evicted-servers active --conf dist/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
 * Starting NuoDB Admin

Make sure the admin knows that all engines in the active data center are gone:

nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
  [active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
  [passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
  test [state = NOT_RUNNING]

Attempt handoff with an --oldest-acceptable in the future:

nuocmd handoff database --db-name test --all-observer-archive-ids --oldest-acceptable 2100-01-01T01:00:00
STARTING: SM process on archive 1
'handoff database' failed: Time of the most recent consistent state 2021-03-16T14:28:37 is earlier than supplied '--oldest-acceptable' 2100-01-01T01:00:00. Aborting handoff

Shutdown the database:

$ nuocmd shutdown database --db-name test

Re-attempt handoff with an --oldest-acceptable in the past:

nuocmd handoff database --db-name test --all-observer-archive-ids --oldest-acceptable 2000-01-01T01:00:00
STARTING: SM process on archive 1
Time of the most recent consistent state: 2021-03-16T14:28:37
Reset state run successfully
Successfully handed off database, you may proceed with the next handoff steps

Now run the deprovisioning step:

nuocmd delete server --server-id active

Now run the reprovisioning step:

nuocmd start database --db-name test --incremental --te-server-ids passive
STARTING: StartProcessRequest(db_name=test, engine_type=TE, ...)

Finally, verify database contents and run more workload using nuosql:

select * from test;
 N   S
 -- ---

 1  one
insert into test values(2, 'two');
select * from test;
 N   S
 -- ---

 1  one
 2  two

Clean up:

nuocmd shutdown database --db test
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:12:59.874, client token: ...
Servers:
  [passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
  test [state = NOT_RUNNING]