Using the Handoff Database Command Example
This example mirrors the previous example except uses the handoff database
command to assist with handoff.
The first step is to set up two APs named active
and passive
on hosts named host1
and host2
(substitute your own host names.)
In real life you would not use active
and passive
for AP server ids
since you can’t change the name when the passive data center becomes active
after disaster recovery.
Make sure to remove any previous state on both hosts so you start with clean machines:
rm -f dist/var/opt/raftlog
rm -rf /data/archive1
rm -rf /data/observer1
Create configuration files for the active and passive admin processes. See Extending the Database Across Multiple Hosts (Scaling Out) for instructions.
The nuoadmin.conf.active
configuration file is nuoadmin.conf
with these changes:
- "ThisServerId": "nuoadmin-0", + "ThisServerId": "active", "initialMembership": { - "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" } + "active": { "transport": "host1:48005", "version": "0:10000" }, + "passive": { "transport": "host2:48005", "version": "0:10000" } },
The nuoadmin.conf.passive
configuration file is nuoadmin.conf
with these changes:
- "ThisServerId": "nuoadmin-0", + "ThisServerId": "passive", "initialMembership": { - "nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" } + "active": { "transport": "host1:48005", "version": "0:10000" }, + "passive": { "transport": "host2:48005", "version": "0:10000" } },
Set environment variables NUODB_HOME
, NUOCMD_CLIENT_KEY
, and NUOCMD_VERIFY_SERVER
appropriately on each host.
See
Enabling TLS Encryption
for instructions.
Execute this command on the active host:
dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.active
* Starting NuoDB Admin
and this command on the passive host:
dist/etc/nuoadmin start --conf dist/etc/nuoadmin.conf.passive
* Starting NuoDB Admin
There might be false timeout errors.
You can do nuocmd show domain
on either host to verify that both APs are running
and connected to each other.
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:08:23.899, client token: ...
Servers:
[active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected
[passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected *
Databases:
The next step is to create a database on the active host:
nuocmd create archive --db-name test --server-id active --archive-path /data/archive1
Archive(archive_path=/data/archive1, db_name=test, id=0, server_id=active, state=PROVISIONED)
nuocmd create database --db-name test --dba-user cloud --dba-password user --te-server-ids active
STARTING: StartProcessRequest(archive_id=0, db_name=test, engine_type=SM, labels={}, options={}, server_id=active)
STARTING: StartProcessRequest(db_name=test, engine_type=TE, labels={}, options={}, server_id=active)
then add an asynchronous storage manager on the passive host:
nuocmd create archive --db-name test --server-id passive --archive-path /data/observer1 --passive
Archive(archive_path=/data/observer1, db_name=test, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)
nuocmd start database --db test --incremental
STARTING: StartProcessRequest(archive_id=1, db_name=test, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive)
You can use nuocmd show domain
on either host to verify that two APs, two SMs, and one TE are running.
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:09:27.460, client token: ...
Servers:
[active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected
[passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected *
Databases:
test [state = RUNNING]
[SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING
[TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack = 4.43] MONITORED:RUNNING
[SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack = 3.98] MONITORED:RUNNING
Do some minimal work in the database using nuosql
:
create table test (n int, s string);
insert into test values(1, 'one');
select * from test;
N S
-- ---
1 one
Now simulate failure by using kill -9
to kill the AP, SM, and TE processes on the active host.
For demo purposes this example does not use --ping-timeout
so the SM on the passive host remains up.
The AP on the passive host also remains up.
But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:
nuocmd shutdown database --db test
'shutdown database' failed: Unable to request database shutdown for dbName=test: Unable to get command response: Command request timed out
so use kill -9
to kill the AP and SM processes on the passive host.
Having simulated a disaster, start recovery. Restart the AP following instructions in Re-establishing Admin Process Quorum
dist/etc/nuoadmin restart --evicted-servers active --conf dist/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
* Starting NuoDB Admin
Make sure the admin knows that all engines in the active data center are gone:
nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
[active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
[passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
test [state = NOT_RUNNING]
Attempt handoff with an --oldest-acceptable
in the future:
nuocmd handoff database --db-name test --all-observer-archive-ids --oldest-acceptable 2100-01-01T01:00:00
STARTING: SM process on archive 1
'handoff database' failed: Time of the most recent consistent state 2021-03-16T14:28:37 is earlier than supplied '--oldest-acceptable' 2100-01-01T01:00:00. Aborting handoff
Shutdown the database:
$ nuocmd shutdown database --db-name test
Re-attempt handoff with an --oldest-acceptable
in the past:
nuocmd handoff database --db-name test --all-observer-archive-ids --oldest-acceptable 2000-01-01T01:00:00
STARTING: SM process on archive 1
Time of the most recent consistent state: 2021-03-16T14:28:37
Reset state run successfully
Successfully handed off database, you may proceed with the next handoff steps
Now run the deprovisioning step:
nuocmd delete server --server-id active
Now run the reprovisioning step:
nuocmd start database --db-name test --incremental --te-server-ids passive
STARTING: StartProcessRequest(db_name=test, engine_type=TE, ...)
Finally, verify database contents and run more workload using nuosql
:
select * from test;
N S
-- ---
1 one
insert into test values(2, 'two');
select * from test;
N S
-- ---
1 one
2 two
Clean up:
nuocmd shutdown database --db test
nuocmd show domain
server version: 4.2.dev-9999-e167a01c9a, server license: Enterprise
server time: 2021-01-11T16:12:59.874, client token: ...
Servers:
[passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
test [state = NOT_RUNNING]