Asynchronous Commit Example

This example exercises an unrealistically small case. There is one active and one passive data center. Each data center contains only one host, only one AP, and only one SM. The active data center contains only one TE.

This example assumes the NuoDB distribution is in a directory named dist on each host and archives will be stored locally on each host in a directory named /data.

The first step is to set up two APs named active and passive on hosts named host1 and host2 (substitute your own host names.) In real life you would not use active and passive for AP server ids since you can’t change the name when the passive data center becomes active after disaster recovery.

Make sure to remove any previous state on both hosts so you start with clean machines:

rm -f dist/var/opt/raftlog
rm -rf /data/archive1
rm -rf /data/observer1

Create configuration files for the active and passive admin processes. See Extending the Database Across Multiple Hosts (Scaling Out) for instructions.

The nuoadmin.conf.active configuration file is nuoadmin.conf with these changes:

-  "ThisServerId": "nuoadmin-0",
+  "ThisServerId": "active",

   "initialMembership": {
-    "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
+    "active": { "transport": "host1:48005", "version": "0:10000" },
+    "passive": { "transport": "host2:48005", "version": "0:10000" }
   },

The nuoadmin.conf.passive configuration file is nuoadmin.conf with these changes:

-  "ThisServerId": "nuoadmin-0",
+  "ThisServerId": "passive",

   "initialMembership": {
-    "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
+    "active": { "transport": "host1:48005", "version": "0:10000" },
+    "passive": { "transport": "host2:48005", "version": "0:10000" }
   },

Set environment variables NUODB_HOME, NUOCMD_CLIENT_KEY, and NUOCMD_VERIFY_SERVER appropriately on each host. See Enabling TLS Encryption for instructions.

Execute this command on the active host:

dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.active
 * Starting NuoDB Admin

and this command on the passive host:

dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.passive
 * Starting NuoDB Admin

There might be false timeout errors. You can do nuocmd show domain on either host to verify that both APs are running and connected to each other.

nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:08:23.899, client token: ...
Servers:
  [active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected
  [passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected *
Databases:

The next step is to create a database on the active host:

nuocmd create archive --db-name test --server-id active --archive-path /data/archive1
Archive(archive_path=/data/archive1, db_name=test, id=0, server_id=active, state=PROVISIONED)
nuocmd create database --db-name test --dba-user cloud --dba-password user --te-server-ids active
STARTING: StartProcessRequest(archive_id=0, db_name=test, engine_type=SM, labels={}, options={}, server_id=active)
STARTING: StartProcessRequest(db_name=test, engine_type=TE, labels={}, options={}, server_id=active)

then add an asynchronous storage manager on the passive host:

nuocmd create archive --db-name test --server-id passive --archive-path /data/observer1 --passive
Archive(archive_path=/data/observer1, db_name=test, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)
nuocmd start database --db test --incremental
STARTING: StartProcessRequest(archive_id=1, db_name=test, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive)

You can use nuocmd show domain on either host to verify that two APs, two SMs, and one TE are running.

nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:09:27.460, client token: ...
Servers:
  [active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected
  [passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected *
Databases:
  test [state = RUNNING]
    [SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING
    [TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack =  4.43] MONITORED:RUNNING
    [SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack =  3.98] MONITORED:RUNNING

Do some minimal work in the database using nuosql:

create table test (n int, s string);
insert into test values(1, 'one');
select * from test;
 N   S
 -- ---

 1  one

Now simulate failure by using kill -9 to kill the AP, SM, and TE processes on the active host.

For demo purposes this example does not use --ping-timeout so the SM on the passive host remains up. The AP on the passive host also remains up. But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:

nuocmd shutdown database --db test
'shutdown database' failed: Unable to request database shutdown for dbName=test: Unable to get command response: Command request timed out

so use kill -9 to kill the AP and SM processes on the passive host.

Having simulated a disaster, start recovery. Restart the AP following instructions in Re-establishing Admin Process Quorum

dist/etc/nuoadmin restart --evicted-servers active --conf dist/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
 * Starting NuoDB Admin

Make sure the admin knows that all engines in the active data center are gone:

nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
  [active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
  [passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
  test [state = NOT_RUNNING]

Proceed with handoff:

nuocmd start process --db-name test --server-id passive --engine-type SM --archive-id 1
Process(archive_id=1, db_name=test, engine_type=SM, ...)
nuocmd handoff report-timestamp --db-name test --archive-ids 1
ReportTimestamp(commits=0,0,3, epoch=2, leaders=2 1, timestamp=2021-01-08T21:26:20)

That timestamp is acceptable, so deprovision the failed active data center:

nuocmd delete server --server-id active

Now run the resolution step:

nuocmd check database --db-name test --check-syncing --num-processes 1 --timeout 60
nuocmd handoff reset-state --db-name test --commits 0 0 3 --leaders 2 1 --epoch 2
State successfully reset

Now run the promotion step:

nuocmd set archive --archive-id 1 --active

Now run the reprovisioning step:

nuocmd start database --db-name test --incremental --te-server-ids passive
STARTING: StartProcessRequest(db_name=test, engine_type=TE, ...)

Finally, verify database contents and run more workload using nuosql:

select * from test;
 N   S
 -- ---

 1  one
insert into test values(2, 'two');
select * from test;
 N   S
 -- ---

 1  one
 2  two

Clean up:

nuocmd shutdown database --db test
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:12:59.874, client token: ...
Servers:
  [passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
  test [state = NOT_RUNNING]