Disaster Recovery Using Handoff - Example
This section demonstrates disaster recovery using the handoff procedure and the handoff database command.
The example uses the following configurations:
-
There is one active and one passive data center.
-
The active data center has one host (
host1) running an Admin Process (AP), a Storage Manager (SM), and a Transaction Engine (TE). -
The passive data center has one host (
host2) running an AP and an Asynchronous Storage Manager (ASM). -
NuoDB RPM server package is installed on both
host1andhost2. For the NuoDB TAR server package installations, adjust the pathnames that are referenced in the command samples on this page accordingly. -
The NuoDB distribution is in a directory named
/opt/nuodbon each host and archives will be stored locally on each host in a directory named/data.
In this example, we simulate a disaster on the database and then perform the handoff procedure for database recovery.
| NuoDB Enterprise license is required to support database configurations with multiple SMs. For more information, see Obtain and Install a Product License. |
Set Up the Test
-
Set up two APs named
activeandpassiveon hosts namedhost1andhost2.In a production database, do not use activeandpassivefor AP server ids since the name cannot be changed when the passive data center becomes active after disaster recovery. -
Remove any previous states on
host1andhost2:rm -f /var/opt/nuodb/raftlogrm -rf /data/archive1rm -rf /data/observer1 -
Create configuration files for the
activeandpassiveAPs.For detailed instructions, see Extending the Database Across Multiple Hosts (Scaling Out).
Make the following two changes to the
nuoadmin.confproperties on the hosts:Host Default code Replace with host1
-
"initialMembership": {
"nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
}, -
"ThisServerId": "nuoadmin-0",
-
"initialMembership": {
"active": { "transport": "host1:48005", "version": "0:10000" },
"passive": { "transport": "host2:48005", "version": "0:10000" }
}, -
"ThisServerId": "active",
host2
-
"initialMembership": {
"nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
}, -
"ThisServerId": "nuoadmin-0",
-
"initialMembership": {
"active": { "transport": "host1:48005", "version": "0:10000" },
"passive": { "transport": "host2:48005", "version": "0:10000" }
}, -
"ThisServerId": "passive",
For more information on
initialMembershipandThisServerIdproperties, see Settingnuoadmin.confProperties. -
-
Set environment variables
NUODB_HOME,NUOCMD_CLIENT_KEY, andNUOCMD_VERIFY_SERVERappropriately on each host.See Enabling TLS Encryption for instructions.
-
Execute
nuoadmin start --confcommand on thehost1andhost2:/opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.active* Starting NuoDB Admin/opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.passive* Starting NuoDB AdminRun
nuocmd show domainon either host to verify that both APs are running and connected to each other.nuocmd show domainserver version: 6.0-1-fc6a857de9, server license: Enterprise server time: 2023-01-11T16:08:23.899, client token: ... Servers: [active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected [passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected * Databases: -
Create a database on the active host.
nuocmd create archive --db-name DB_NAME --server-id active --archive-path /data/archive1Archive(archive_path=/data/archive1, db_name=DB_NAME, id=0, server_id=active, state=PROVISIONED)nuocmd create database --db-name DB_NAME --dba-user cloud --dba-password user --te-server-ids activeSTARTING: StartProcessRequest(archive_id=0, db_name=DB_NAME, engine_type=SM, labels={}, options={}, server_id=active) STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, labels={}, options={}, server_id=active) -
Add an Asynchronous Storage Manager (ASM) on the passive host:
nuocmd create archive --db-name DB_NAME --server-id passive --archive-path /data/observer1 --passiveArchive(archive_path=/data/observer1, db_name=DB_NAME, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)nuocmd start database --db DB_NAME --incrementalSTARTING: StartProcessRequest(archive_id=1, db_name=DB_NAME, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive) -
Run
nuocmd show domainon either host to verify that two APs, two SMs, and one TE are running.nuocmd show domainserver version: 6.0-1-fc6a857de9, server license: Enterprise server time: 2023-01-11T16:09:27.460, client token: ... Servers: [active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected [passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected * Databases: DB_NAME [state = RUNNING] [SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING [TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack = 4.43] MONITORED:RUNNING [SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack = 3.98] MONITORED:RUNNING -
Execute SQL statements on the database using
nuosql:create table TABLE_NAME (n int, s string);insert into TABLE_NAME values(1, 'one');select * from TABLE_NAME;N S -- --- 1 one
Simulate Failure
Simulate a disaster by using kill -9 to kill the AP, SM, and TE processes on the active host (host1).
For demo purposes, this example does not use --ping-timeout so the SM on the passive host (host2) remains up.
The AP on the passive host also remains up.
But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:
nuocmd shutdown database --db DB_NAME
'shutdown database' failed: Unable to request database shutdown for dbName=DB_NAME: Unable to get command response: Command request timed out
So, use kill -9 to kill the AP and SM processes on the passive host.
Recover the Admin Process (AP)
Restart the AP following instructions in Re-establishing Admin Process (AP) Quorum.
The commands in this section are executed on host2.
|
/opt/nuodb/etc/nuoadmin restart --evicted-servers active --conf /opt/nuodb/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
* Starting NuoDB Admin
To ensure that the admin knows that all engines in the active data center are gone:
nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 5.0-2-a8628647c3, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
[active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
[passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
DB_NAME [state = NOT_RUNNING]
Recover the Database
There are two ways to recover the database:
-
Recovery using the handoff procedure
-
Recovery using the handoff database command
Recovery Using the Handoff Procedure
-
Run the Confirmation step:
nuocmd start process --db-name DB_NAME --server-id passive --engine-type SM --archive-id 1Process(archive_id=1, db_name=DB_NAME, engine_type=SM, ...)nuocmd handoff report-timestamp --db-name DB_NAME --archive-ids 1ReportTimestamp(commits=0,0,3, epoch=2, leaders=2 1, timestamp=2021-01-08T21:26:20)The
timestampis acceptable, so continue with the next step. -
Run the Deprovisioning step:
nuocmd delete server --server-id active -
Run the Resolution step:
nuocmd check database --db-name DB_NAME --check-syncing --num-processes 1 --timeout 60nuocmd handoff reset-state --db-name DB_NAME --commits 0 0 3 --leaders 2 1 --epoch 2State successfully reset -
Run the Promotion step:
nuocmd set archive --archive-id 1 --active -
Run the Reprovisioning step:
nuocmd start database --db-name DB_NAME --incremental --te-server-ids passiveSTARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)
Recovery Using the Handoff Database Command
-
Run handoff with an
--oldest-acceptableearlier than the consistent state:nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2000-01-01T01:00:00STARTING: SM process on archive 1 Time of the most recent consistent state: 2021-03-16T14:28:37 Reset state run successfully Successfully handed off database, you may proceed with the next handoff stepsThe
nuocmd handoff databasecommand automatically runs the Confirmation, Resolution, and Promotion steps of the handoff procedureIf the
--oldest-acceptableis later than the consistent state, the handoff is aborted.nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2100-01-01T01:00:00STARTING: SM process on archive 1 'handoff database' failed: Time of the most recent consistent state 2021-03-16T14:28:37 is earlier than supplied '--oldest-acceptable' 2100-01-01T01:00:00. Aborting handoffShut down the database:
nuocmd shutdown database --db-name DB_NAME -
Run the Deprovisioning step:
nuocmd delete server --server-id active -
Run the Reprovisioning step:
nuocmd start database --db-name DB_NAME --incremental --te-server-ids passiveSTARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)
Verify the Database Contents
Verify the database contents and run more workload using nuosql.
select * from TABLE_NAME;
N S
-- ---
1 one
insert into TABLE_NAME values(2, 'two');
select * from TABLE_NAME;
N S
-- ---
1 one
2 two
Clean Up
nuocmd shutdown database --db-name DB_NAME
nuocmd show domain
server version: 6.0-1-fc6a857de9, server license: Enterprise
server time: 2023-01-11T16:12:59.874, client token: ...
Servers:
[passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
DB_NAME [state = NOT_RUNNING]