Disaster Recovery Using Handoff - Example
This section demonstrates disaster recovery using the handoff procedure and the handoff database command.
The example uses the following configurations:
-
There is one active and one passive data center.
-
The active data center has one host (
host1
) running an Admin Process (AP), a Storage Manager (SM), and a Transaction Engine (TE). -
The passive data center has one host (
host2
) running an AP and an Asynchronous Storage Manager (ASM). -
NuoDB RPM server package is installed on both
host1
andhost2
. For the NuoDB TAR server package installations, adjust the pathnames that are referenced in the command samples on this page accordingly. -
The NuoDB distribution is in a directory named
/opt/nuodb
on each host and archives will be stored locally on each host in a directory named/data
.
In this example, we simulate a disaster on the database and then perform the handoff procedure for database recovery.
NuoDB Enterprise license is required to support database configurations with multiple SMs. For more information, see Obtain and Install a Product License. |
Set Up the Test
-
Set up two APs named
active
andpassive
on hosts namedhost1
andhost2
.In a production database, do not use active
andpassive
for AP server ids since the name cannot be changed when the passive data center becomes active after disaster recovery. -
Remove any previous states on
host1
andhost2
:rm -f /var/opt/nuodb/raftlog
rm -rf /data/archive1
rm -rf /data/observer1
-
Create configuration files for the
active
andpassive
APs.For detailed instructions, see Extending the Database Across Multiple Hosts (Scaling Out).
Make the following two changes to the
nuoadmin.conf
properties on the hosts:Host Default code Replace with host1
-
"initialMembership": {
"nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
}, -
"ThisServerId": "nuoadmin-0",
-
"initialMembership": {
"active": { "transport": "host1:48005", "version": "0:10000" },
"passive": { "transport": "host2:48005", "version": "0:10000" }
}, -
"ThisServerId": "active",
host2
-
"initialMembership": {
"nuoadmin-0": { "transport": "$(hostname):48005", "version": "0:10000" }
}, -
"ThisServerId": "nuoadmin-0",
-
"initialMembership": {
"active": { "transport": "host1:48005", "version": "0:10000" },
"passive": { "transport": "host2:48005", "version": "0:10000" }
}, -
"ThisServerId": "passive",
For more information on
initialMembership
andThisServerId
properties, see Settingnuoadmin.conf
Properties. -
-
Set environment variables
NUODB_HOME
,NUOCMD_CLIENT_KEY
, andNUOCMD_VERIFY_SERVER
appropriately on each host.See Enabling TLS Encryption for instructions.
-
Execute
nuoadmin start --conf
command on thehost1
andhost2
:/opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.active
* Starting NuoDB Admin
/opt/nuodb/etc/nuoadmin start --conf /opt/nuodb/etc/nuoadmin.conf.passive
* Starting NuoDB Admin
Run
nuocmd show domain
on either host to verify that both APs are running and connected to each other.nuocmd show domain
server version: 6.0-1-fc6a857de9, server license: Enterprise server time: 2023-01-11T16:08:23.899, client token: ... Servers: [active] host1:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/6/6) Connected [passive] host2:port [last_ack = 6.75] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/6/7) Connected * Databases:
-
Create a database on the active host.
nuocmd create archive --db-name DB_NAME --server-id active --archive-path /data/archive1
Archive(archive_path=/data/archive1, db_name=DB_NAME, id=0, server_id=active, state=PROVISIONED)
nuocmd create database --db-name DB_NAME --dba-user cloud --dba-password user --te-server-ids active
STARTING: StartProcessRequest(archive_id=0, db_name=DB_NAME, engine_type=SM, labels={}, options={}, server_id=active) STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, labels={}, options={}, server_id=active)
-
Add an Asynchronous Storage Manager (ASM) on the passive host:
nuocmd create archive --db-name DB_NAME --server-id passive --archive-path /data/observer1 --passive
Archive(archive_path=/data/observer1, db_name=DB_NAME, id=1, observer_storage_groups=[*], server_id=passive, state=PROVISIONED)
nuocmd start database --db DB_NAME --incremental
STARTING: StartProcessRequest(archive_id=1, db_name=DB_NAME, engine_type=SM, expected_incarnation_major=1, expected_incarnation_minor=0, labels={}, options={}, server_id=passive)
-
Run
nuocmd show domain
on either host to verify that two APs, two SMs, and one TE are running.nuocmd show domain
server version: 6.0-1-fc6a857de9, server license: Enterprise server time: 2023-01-11T16:09:27.460, client token: ... Servers: [active] host1:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=passive, log=0/30/30) Connected [passive] host2:port [last_ack = 0.47] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=0/30/30) Connected * Databases: DB_NAME [state = RUNNING] [SM] host1:port [start_id = 0] [server_id = active] [pid = 1493876] [node_id = 1] [last_ack = 10.38] MONITORED:RUNNING [TE] host1:port [start_id = 1] [server_id = active] [pid = 1493879] [node_id = 2] [last_ack = 4.43] MONITORED:RUNNING [SM] host2:port [start_id = 2] [server_id = passive] [pid = 2638312] [node_id = 3] [last_ack = 3.98] MONITORED:RUNNING
-
Execute SQL statements on the database using
nuosql
:create table TABLE_NAME (n int, s string);
insert into TABLE_NAME values(1, 'one');
select * from TABLE_NAME;
N S -- --- 1 one
Simulate Failure
Simulate a disaster by using kill -9
to kill the AP, SM, and TE processes on the active host (host1
).
For demo purposes, this example does not use --ping-timeout
so the SM on the passive host (host2
) remains up.
The AP on the passive host also remains up.
But we cannot do any administrative actions without a quorum of APs, and there is no quorum with the active AP gone:
nuocmd shutdown database --db DB_NAME
'shutdown database' failed: Unable to request database shutdown for dbName=DB_NAME: Unable to get command response: Command request timed out
So, use kill -9
to kill the AP and SM processes on the passive host.
Recover the Admin Process (AP)
Restart the AP following instructions in Re-establishing Admin Process (AP) Quorum.
The commands in this section are executed on host2 .
|
/opt/nuodb/etc/nuoadmin restart --evicted-servers active --conf /opt/nuodb/etc/nuoadmin.conf.passive
NuoDB Admin already stopped
* Starting NuoDB Admin
To ensure that the admin knows that all engines in the active data center are gone:
nuocmd shutdown server-processes --evict --server-id active --timeout 0
nuocmd show domain
server version: 5.0-2-a8628647c3, server license: Enterprise
server time: 2021-01-11T16:11:21.079, client token: ...
Servers:
[active] host1:port [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Evicted
[passive] host2:port [last_ack = 2.76] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/35/35) Connected *
Databases:
DB_NAME [state = NOT_RUNNING]
Recover the Database
There are two ways to recover the database:
-
Recovery using the handoff procedure
-
Recovery using the handoff database command
Recovery Using the Handoff Procedure
-
Run the Confirmation step:
nuocmd start process --db-name DB_NAME --server-id passive --engine-type SM --archive-id 1
Process(archive_id=1, db_name=DB_NAME, engine_type=SM, ...)
nuocmd handoff report-timestamp --db-name DB_NAME --archive-ids 1
ReportTimestamp(commits=0,0,3, epoch=2, leaders=2 1, timestamp=2021-01-08T21:26:20)
The
timestamp
is acceptable, so continue with the next step. -
Run the Deprovisioning step:
nuocmd delete server --server-id active
-
Run the Resolution step:
nuocmd check database --db-name DB_NAME --check-syncing --num-processes 1 --timeout 60
nuocmd handoff reset-state --db-name DB_NAME --commits 0 0 3 --leaders 2 1 --epoch 2
State successfully reset
-
Run the Promotion step:
nuocmd set archive --archive-id 1 --active
-
Run the Reprovisioning step:
nuocmd start database --db-name DB_NAME --incremental --te-server-ids passive
STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)
Recovery Using the Handoff Database Command
-
Run handoff with an
--oldest-acceptable
earlier than the consistent state:nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2000-01-01T01:00:00
STARTING: SM process on archive 1 Time of the most recent consistent state: 2021-03-16T14:28:37 Reset state run successfully Successfully handed off database, you may proceed with the next handoff steps
The
nuocmd handoff database
command automatically runs the Confirmation, Resolution, and Promotion steps of the handoff procedureIf the
--oldest-acceptable
is later than the consistent state, the handoff is aborted.nuocmd handoff database --db-name DB_NAME --all-observer-archive-ids --oldest-acceptable 2100-01-01T01:00:00
STARTING: SM process on archive 1 'handoff database' failed: Time of the most recent consistent state 2021-03-16T14:28:37 is earlier than supplied '--oldest-acceptable' 2100-01-01T01:00:00. Aborting handoff
Shut down the database:
nuocmd shutdown database --db-name DB_NAME
-
Run the Deprovisioning step:
nuocmd delete server --server-id active
-
Run the Reprovisioning step:
nuocmd start database --db-name DB_NAME --incremental --te-server-ids passive
STARTING: StartProcessRequest(db_name=DB_NAME, engine_type=TE, ...)
Verify the Database Contents
Verify the database contents and run more workload using nuosql
.
select * from TABLE_NAME;
N S
-- ---
1 one
insert into TABLE_NAME values(2, 'two');
select * from TABLE_NAME;
N S
-- ---
1 one
2 two
Clean Up
nuocmd shutdown database --db-name DB_NAME
nuocmd show domain
server version: 6.0-1-fc6a857de9, server license: Enterprise
server time: 2023-01-11T16:12:59.874, client token: ...
Servers:
[passive] host2:port [last_ack = 1.53] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=passive, log=1/62/62) Connected *
Databases:
DB_NAME [state = NOT_RUNNING]