Database Management and Failure Event Handling - A Working Example

This section provides a working example of how NuoDB AdminAn interface for domain and database management. Introduced in NuoDB 4.0 to supersede NuoAgent, this interface is used to manage an admin domain. Use NuoDB Admin instead of NuoAgent to start and stop APs. When using NuoDB Admin to manage a domain and its databases, use NuoDB Command (nuocmd) instead of the NuoDB Manager (nuodbmgr) Command Line Interface (CLI) tool. See also Admin Process (AP). can be used is handle failure events

Introduction

The primary functions of NuoDB Admin are to allow the user to:

A NuoDB database is a fully-connected network of Storage Managers (SMs) and Transaction Engines (TEs). Each SM has an archive associated with it, which is the location on disk where data is made durable. The minimal configuration of a NuoDB database comprises one SM and one TE. The steps required to create a minimal database are as follows:

  1. Create an archive object.
  2. Create a database object.
  3. Start an SM on the archive object created in step 1.
  4. Start a TE.

Note: The steps outlined above can be performed on a two-server NuoDB cluster using NuoDB Command (nuocmd) commands. For more information on NuoDB Command and other command line tools, see Command Line Tools. Sample commands used to perform the above steps are as follows:

1. Create an archive object.

nuocmd create archive --db-name test --server-id server0 --archive-path /var/opt/nuodb/archive/test
Archive(archive_path=/var/opt/nuodb/archive/test, db_name=test, id=0, server_id=server0, state=PROVISIONED)

2. Create a database object.

nuocmd create database --db-name test --dba-user test --dba-password test --capture-file startplan.json

3. Start an SM on the archive object created in step 1.

nuocmd start process --db-name test --engine-type SM --archive-id 0 --server-id server0
Process(archive_id=0, db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=SM, labels={}, options={...}, region_name=Default, server_id=server0, start_id=0)

4. Start a TE.

nuocmd start process --db-name test --engine-type TE --server-id server1
Process(db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=TE, labels={}, options={...}, region_name=Default, server_id=server1, start_id=1)

In step 1, an archive object is created with a server ID of server0 and an archive path of /var/opt/nuodb/archive/test. The server ID is the unique identifier for a NuoDB admin server. The create archive command generates an archive ID (id=0), which is the unique identifier for archive objects in the Raft domain state. In step 2, a database named test is created, using credentials for the database administrator user. As part of this example is to start an SM (step 3), the --capture-file startplan.json argument is specified to suppress the default behavior of the command, which is to automatically request SMs on all archive objects for the database. In step 3, the start process command is used to request an SM on the archive object created in step 1 by specifying --archive-id 0 (which is bound to server0). In step 4, the start process command is used request a TE on server1.

Database Process State Lifecycle

In the example above, although the start process command was used, database processes were actually being requested to start.

Note: Although the nuocmd subcommand is named start process, processes are in fact requested. This is because the command actually creates a process object in the Raft domain state with initial state REQUESTED. The actual database process is started asynchronously by the server that the process was requested on (server0 in step 3, server1 in step 4) when that server applies the command to its own Raft state machine to cause the process object to become REQUESTED.

The process object is transitioned to the STARTED state by the admin server the database process was requested on before the actual database process is forked by the admin process. The database process then connects to the same admin server, which manages its lifecycle, which includes receiving configuration parameters from the admin server, being introduced to its database peers, and receiving management request from the admin server for actions like creating storage-groups and issuing the hotcopy backup command.

After executing the commands above, you may run the show domain command, which collates information about NuoDB Admin servers and database processes, to review the state of processes moments after the commands.

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-15T19:09:17.123, client token: 07e10b90d4d5c47719116366d4a86ff95767867e
Servers:
  [server0] server0:48005 (LEADER, Leader=server0) ACTIVE:Connected *
  [server1] server1:48005 (FOLLOWER, Leader=server0) ACTIVE:Connected
Databases: 
  test [RUNNING]
    [SM] server0:48006 (Default) [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] MONITORED:RUNNING
    [TE] server1:48006 (Default) [start_id = 1] [server_id = server1] [pid = 103] [node_id = 2] MONITORED:RUNNING

The show domain output following example shows that processes have transitioned to MONITORED. The MONITORED state and other database process states are summarized as follows:

State

Description

REQUESTED

The database process has been requested by the user.

The start process command actually creates a process with initial state of REQUESTED (see durable_state in the command output for steps 3 and 4 above). The actual database process is started asynchronously by the server that the process was requested on (server0 in step 3, server1 in step 4).

STARTED

The database process has been started by the admin server the process was requested on.

The database process is transitioned from a REQUESTED state to a STARTED state by the NuoDB Admin server that the database process was requested on before. The database process then connects to the same NuoDB Admin server, which manages its lifecycle.

CONFIGURED

The database process has connected to the admin server that started it and has been sent its configuration parameters.

TRACKED

The database process has requested to join its database peers.

MONITORED

The database process has joined its database peers and is ready to service management requests from the NuoDB Admin server, such as creating storage-groups, initiating hotcopy, and shutting down gracefully. Also, once a database process is in this state, it can start notifying NuoDB Admin of state changes it undergoes due to database operation such as SYNCING and RUNNING (see the RUNNING engine_state attribute in the show domain output above).

REQUESTED_SHUTDOWN

The database process has been requested to be shutdown.

Process Tombstones

Since process start IDs are globally unique, a new one is generated every time a process is requested and they are never reused. Therefore, they allow for historical reference to processes that have exited which are known as Tombstones. When a process exits, it makes an effort to communicate the reason why it is exiting to the admin server that is managing it (though this is not possible if the process is killed with kill -9, since that causes the operating system to kill the process without sending a signal to it). This message, as well as the exit code, is stored in the Raft domain state. We will see when we discuss database incarnations that process tombstones are useful diagnostic tool.

Archive Lifecycle

So far in this working example, a NuoDB database consisting one SM and one TE has been created. Now let's add redundancy to the storage and transaction layers by adding an SM and a TE. This configuration of two TEs and two SMs is known as a minimally redundant database configuration.

Create an archive object on server1.

nuocmd create archive --db-name test --server-id server1 --archive-path /var/opt/nuodb/archive/test
Archive(archive_path=/var/opt/nuodb/archive/test, db_name=test, id=1, server_id=server1, state=PROVISIONED)

You can confirm the PROVISIONED state of the additional archive object using the show archives command.

nuocmd show archives
Archive: [0] server0 : /var/opt/nuodb/archive/test [db = test] RUNNING
  [SM] server0:48006 (Default) [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] MONITORED:RUNNING
Archive: [1] server1 : /var/opt/nuodb/archive/test [db = test] PROVISIONED

Start an SM on server1.

nuocmd start process --db-name test --engine-type SM --archive-id 1 --server-id server1
Process(archive_id=1, db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=SM, labels={}, options={...}, region_name=Default, server_id=server1, start_id=2)

Start a TE on server0.

nuocmd start process --db-name test --engine-type TE --server-id server0
Process(db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=TE, labels={}, options={...}, region_name=Default, server_id=server0, start_id=3)

You may then review SMs and TEs, running on multiple hosts, using the show domain command.

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-15T19:59:41.398, client token: 2e960884cb5b14813df163f84b2f1690eae06516
Servers: 
  [server0] server0:48005 (LEADER, Leader=server0) ACTIVE:Connected *
  [server1] server1:48005 (FOLLOWER, Leader=server0) ACTIVE:Connected
Databases:
  test [RUNNING]
    [SM] server0:48006 (Default) [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] MONITORED:RUNNING
    [TE] server1:48006 (Default) [start_id = 1] [server_id = server1] [pid = 103] [node_id = 2] MONITORED:RUNNING   
    [SM] server1:48007 (Default) [start_id = 2] [server_id = server1] [pid = 159] [node_id = 3] MONITORED:RUNNING
    [TE] server0:48007 (Default) [start_id = 3] [server_id = server0] [pid = 281] [node_id = 4] MONITORED:RUNNING

After starting the SM, show archive objects in domain state.

nuocmd show archives
Archive: [0] server0 : /var/opt/nuodb/archive/test [db = test] RUNNING
  [SM] server0:48006 (Default) [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] MONITORED:RUNNING
Archive: [1] server1 : /var/opt/nuodb/archive/test [db = test] RUNNING
  [SM] server1:48007 (Default) [start_id = 2] [server_id = server1] [pid = 159] [node_id = 3] MONITORED:RUNNING

The steps above are similar to the ones performed earlier, except that the server IDs are reversed so that we end up with an SM and a TE on each server. The show archives command is used to show the state of the archive object before an SM is started on it. This shows that the archive is in the PROVISIONED state before an SM is started and in the RUNNING state after an SM is started. The PROVISIONED state is the initial state of an archive and indicates that the archive has not been initialized and has no database data.

An archive object is transitioned from PROVISIONED to RUNNING when the SM process object associated with the archive is transitioned from STARTED to CONFIGURED (this is the point at which the admin server sends the SM its configuration parameters, which includes the archive directory). An archive object is transitioned from RUNNING to NOT_RUNNING when the associated SM exits, that is, a tombstone is generated for the process object. To demonstrate this, let's shut down one of the SMs, then show archive objects in domain state after shutting down SM.

nuocmd shutdown process --start-id 0
nuocmd show archives
Archive: [0] server0 : /var/opt/nuodb/archive/test [db = test] NOT_RUNNING
[SM] server0:48006 [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] EXITED(REQUESTED_SHUTDOWN:SHUTTING_DOWN):(2019-05-16T02:24:26.401+0000) Gracefully shutdown engine (0) Archive: [1] server1 : /var/opt/nuodb/archive/test [db = test] RUNNING [SM] server1:48007 (Default) [start_id = 2] [server_id = server1] [pid = 159] [node_id = 3] MONITORED:RUNNING
nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T02:24:29.936, client token: 11134db95a3f260c2dd8e4f442913fd24d8a9532
Servers: [server0] server0:48005 (LEADER, Leader=server0) ACTIVE:Connected * [server1] server1:48005 (FOLLOWER, Leader=server0) ACTIVE:Connected Databases: test [AWAITING_ARCHIVE_HISTORIES_INC] [TE] server1:48006 (Default) [start_id = 1] [server_id = server1] [pid = 103] [node_id = 2] MONITORED:RUNNING [SM] server1:48007 (Default) [start_id = 2] [server_id = server1] [pid = 159] [node_id = 3] MONITORED:RUNNING [TE] server0:48007 (Default) [start_id = 3] [server_id = server0] [pid = 281] [node_id = 4] MONITORED:RUNNING

First we shutdown the SM with start ID 0, which is associated with archive ID 0. The output for nuocmd show archives then shows the archive object in NOT_RUNNING state and also shows the tombstone of the SM. Finally, the output for nuocmd show domain shows that there are now three running database processes, as expected, and also shows that the database object is in AWAITING_ARCHIVE_HISTORIES_INC state, which we discuss in the next section.

Database State Lifecycle

To correctly start a database containing NOT_RUNNING archives (ones that have been initialized and have database data), the admin tier has to determine which NOT_RUNNING archives have the most recent data for each storage-group so that it can start the SM that has the most recent data for the unpartitioned storage-group first for each SM, supply configuration parameters that indicate which storage-groups it can put into service. This process is called storage-group leader assignment.

Storage-group leader assignment is performed as part of the process lifecycle and requires the user to request SMs on all archive objects for a database in parallel. When an SM is requested on a NOT_RUNNING archive object, the admin server that is managing it will request an archive history for the archive, which consists of a logical timestamp and the set of storage-groups that were in RUNNING state for the archive at that time. This occurs before the transition from STARTED to CONFIGURED. Once archive histories have been collected for all SMs and storage-group leader assignment has been performed, they will all be transitioned to CONFIGURED.

Additionally, the SM that is chosen as leader for the unpartitioned storage-group must be started first and must go through its full start-up lifecycle, up to durable_state=MONITORED and engine_state=RUNNING (i.e., MONITORED:RUNNING), before any other database processes (SMs or TEs) can join it (which occurs after the transition from CONFIGURED to TRACKED).

There are two database states that signal that the admin layer is awaiting archive histories for NOT_RUNNING archives. AWAITING_ARCHIVE_HISTORIES_MSG is the state the database enters when storage-group leader assignment is being performed for all archives because there are no running SMs, described in the previous paragraph. AWAITING_ARCHIVE_HISTORIES_INC is the state the database enters whenever there are RUNNING and NOT_RUNNING archive objects for the same database. In this case, the admin layer will request archive histories for SMs in STARTED state, but will only wait for all archive histories before transitioning to CONFIGURED state if a storage-group is encountered in the archive history that is not currently RUNNING.

NuoDB database states are summarized as follows:

State

Description

NOT_RUNNING

There are no process objects for the database (excluding Tombstones).

AWAITING_ARCHIVE_HISTORIES_MSG

There are NOT_RUNNING archives that NuoDB Admin is waiting to collect archive states from, so that storage-group leader assignment can be performed (and there are no running SMs). All SMs are blocked in a STARTED state until storage-group leader assignment is performed, and all TEs will be blocked in a TRACKED state.

REQUESTED / STARTED

Storage group leader assignment has been performed and all SM processes have been transitioned to CONFIGURED. The first SM will be started as the initial member of the database incarnation (node_id=1), while all database processes (SMs and TEs) will be blocked in a TRACKED state.

RUNNING

The first SM (leader of the unpartitioned storage group) has reached a MONITORED:RUNNING state, and now all database processes can join it.

AWAITING_ARCHIVE_HISTORIES_INC

An SM has exited, leaving the archive in a NOT_RUNNING state, but other SMs are still running. When an SM is started on an archive with a NOT_RUNNING state, its archive state will be requested, and if the archive contains storage groups that are not currently running, the SM will be blocked in a STARTED state until all archives with a NOT_RUNNING state are collected and storage group leader assignment can be performed for the non-running storage groups.

REQUESTED_TE_SHUTDOWN

Database shutdown has been requested by the user, or NuoDB Admin is shutting down TEs because the database has become non-durable (that is, all SMs have exited). When a database transitions to this state, all NuoDB Admin processes that are managing TEs for the database will transition their TEs to a REQUESTED_SHUTDOWN state.

REQUESTED_SM_SHUTDOWN

All TEs were shutdown as part of a shutdown-database request, and now all SMs must be transitioned to a REQUESTED_SHUTDOWN state.

TOMBSTONE

The database has been marked as deleted due to a user request (for example, an invocation of delete database). The database object is not actually deleted from the Raft domain state, but it can be replaced by a subsequent invocation of create database.

Database Incarnations and Diagnosing Failure Events

To simulate a failure event, kill the only running SM (start ID 2) which will cause the database to become non-durable. As a result, this will cause the admin tier to shutdown all running TEs:

Kill the only running SM.

nuocmd shutdown process --start-id 2 --kill

Show domain state.

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T16:55:37.956, client token: c9c230ae1116929620a813ee05a7e154ce8a2248
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [NOT_RUNNING]

Show all database incarnations.

nuocmd show database --db-name test --all-incarnations
Database(default_options={}, default_region_id=0, incarnation=(1, 4), name=test, server_assignments={}, state=NOT_RUNNING)
  test [NOT_RUNNING]
    [TE] server1:48006 [start_id = 1] [server_id = server1] [pid = 103] [node_id = 2] EXITED(REQUESTED_SHUTDOWN:SHUTTING_DOWN):(2019-05-16T16:55:34.246+0000) Gracefully shutdown engine (0)
    [TE] server0:48007 [start_id = 3] [server_id = server0] [pid = 281] [node_id = 4] EXITED(REQUESTED_SHUTDOWN:SHUTTING_DOWN):(2019-05-16T16:55:34.236+0000) Gracefully shutdown engine (0)
    [SM] server1:48007 [start_id = 2] [server_id = server1] [pid = 159] [node_id = 3] EXITED(MONITORED:RUNNING):Removing process from domain state due to service request: (2019-05-16T16:55:33.945+0000) Engine connection to admin process closed (137)
    [SM] server0:48006 [start_id = 0] [server_id = server0] [pid = 144] [node_id = 1] EXITED(REQUESTED_SHUTDOWN:SHUTTING_DOWN):(2019-05-16T02:24:26.401+0000) Gracefully shutdown engine (0)

Note: This shows the behavior described above for REQUESTED_TE_SHUTDOWN. It also shows how we can inspect past database incarnations in the event of a failure event. A database incarnation consists of two components (major, minor). The major number indicates the number of times the database has been cold-restarted, and the minor number indicates the number of processes that have exited within the current incarnation.

To demonstrate the need to start all archives with a NOT_RUNNING state, to restart a database, let's restart only one of the archive objects.

nuocmd start process --db-name test --engine-type SM --archive-id 0 --server-id server0
Process(archive_id=0, db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=SM, labels={}, options={...}, region_name=Default, server_id=server0, start_id=4)

Show the domain state immediately after restarting the single archive (and show the domain state again about ten seconds later).

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T17:22:19.443, client token: e432f083ca44658e58d9d4e04c18681a75ab7bd3
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [AWAITING_ARCHIVE_HISTORIES_MSG]
    [SM] <UNKNOWN ADDRESS> (Default) [start_id = 4] [server_id = server0] [pid = ] [node_id = ] STARTED:UNREACHABLE(UNKNOWN)
nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T17:22:28.061, client token: bcb367b0a5975cdeb93c79c50eb527cfe75fecd1
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [NOT_RUNNING]

Show the last database incarnation.

nuocmd show database --db-name test
Database(default_options={}, default_region_id=0, incarnation=(2, 1), name=test, server_assignments={}, state=NOT_RUNNING)
test [NOT_RUNNING]
[SM] <UNKNOWN ADDRESS> [start_id = 4] [server_id = server0] [pid = ] [node_id = ] EXITED(STARTED:UNKNOWN):local agent connection was rejected: ArchiveHistory failed: Timed out while awaiting leader assignment (10000 ms): (2019-05-16T17:22:25.465+0000) Engine connection to admin process closed (2)

In step 4, an SM was started on archive-id 0 but not on archive-id 1. The running of the show domain command immediately after shows the asynchronous nature of requesting a process; start process does not block until the process completes its start-up lifecycle. The SM is blocked in a STARTED state as NuoDB Admin waits for all archive states (see AWAITING_ARCHIVE_HISTORIES_MSG in Database Management and Failure Event Handling - A Working Example). The running of the show domain again (over 10 seconds later) shows no process objects at all, and the database back in a NOT_RUNNING state as there was a timeout while waiting for all archive states.

To demonstrate another negative scenario, let's request a TE without requesting any SMs:

Start a TE without having started any SMs.

nuocmd start process --db-name test --engine-type TE --server-id server0
Process(db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=TE, labels={}, options={...}, region_name=Default, server_id=server0, start_id=5)

2. Show the domain state immediately after starting the TE (and show the domain state again about ten seconds later).

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T17:59:17.663, client token: 846593d993d82353ec3ac163467b0092fbaf3938
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [AWAITING_ARCHIVE_HISTORIES_MSG]
    [TE] server0:48006 (Default) [start_id = 5] [server_id = server0] [pid = 1309] [node_id = ] TRACKED:UNREACHABLE(UNKNOWN)
nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T17:59:26.670, client token: 981f5ecf7632b83b54db205b249fe760ad64380c
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [NOT_RUNNING]

3. Show the last database incarnation.

nuocmd show database --db-name test
Database(default_options={}, default_region_id=0, incarnation=(3, 1), name=test, server_assignments={}, state=NOT_RUNNING)
test [NOT_RUNNING]
[TE] server0:48006 [start_id = 5] [server_id = server0] [pid = 1309] [node_id = ] EXITED(TRACKED:UNKNOWN):database entry request was rejected: EntryRequest failed: Timed out while awaiting entry node (10000 ms): (2019-05-16T17:59:24.560+0000) Engine connection to admin process closed (127)

As before, the database transitions to the AWAITING_ARCHIVE_HISTORIES_MSG state. Since the TE has no archive, its configuration parameters are known without performing storage-group leader assignment and it can be transitioned to the TRACKED state, where it is blocked until the database becomes RUNNING, which never happens. The process tombstone shows that it timed out waiting for the entry node (the leaderAssignmentTimeout property controls this timeout as well).

Restarting a Database after a Failure Event

Restart the database using the start process command:

Start SMs on both archive objects.

nuocmd start process --db-name test --engine-type SM --archive-id 0 --server-id server0
Process(archive_id=0, db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=SM, labels={}, options={...}, region_name=Default, server_id=server0, start_id=6)
nuocmd start process --db-name test --engine-type SM --archive-id 1 --server-id server1
Process(archive_id=1, db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=SM, labels={}, options={...}, region_name=Default, server_id=server1, start_id=7)

Start two TEs.

nuocmd start process --db-name test --engine-type TE --server-id server0
Process(db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=TE, labels={}, options={...}, region_name=Default, server_id=server0, start_id=8)
nuocmd start process --db-name test --engine-type TE --server-id server1
Process(db_name=test, durable_state=REQUESTED, engine_state=UNKNOWN, engine_type=TE, labels={}, options={...}, region_name=Default, server_id=server1, start_id=9)

Show the domain state.

nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T18:25:05.982, client token: ff4dc770f5540d0651804d1118f5867f678026df
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [RUNNING]
    [SM] server0:48006 (Default) [start_id = 6] [server_id = server0] [pid = 1392] [node_id = 2] MONITORED:RUNNING
    [SM] server1:48006 (Default) [start_id = 7] [server_id = server1] [pid = 806] [node_id = 1] MONITORED:RUNNING
    [TE] server0:48007 (Default) [start_id = 8] [server_id = server0] [pid = 1425] [node_id = 3] MONITORED:RUNNING
    [TE] server1:48007 (Default) [start_id = 9] [server_id = server1] [pid = 827] [node_id = 4] MONITORED:RUNNING

This time we request SMs to start on all archive objects in the NOT_RUNNING state (the number of TEs started is not important). Two TEs were requested to start to restore the database configuration we had previously, but we could start any number of them. Since archive objects are stored in the Raft domain state, the user does not have to keep track of archive IDs in order to specify all of them, but can request that information from the admin tier. NuoDB Admin has the notion of a start plan, which contains the set of start-process requests that must be issued in order to restart a database, as well as TEs on the specified server IDs. There is no durable notion of TEs, and therefore may be specified when requesting a start plan or after actual database start-up. Below is an example of shutting down the database and requesting a start plan.

nuocmd shutdown database --db-name test
nuocmd get startplan --db-name test --te-server-ids server0 server1 --output-file startplan.json
cat startplan.json
{
  "incremental": false,
  "processes": [
    {
      "archiveId": 0,
      "dbName": "test",
      "engineType": "SM",
      "host": "server0",
      "labels": {},
      "overrideOptions": {}
    },
    {
      "archiveId": 1,
      "dbName": "test",
      "engineType": "SM",
      "host": "server1",
      "labels": {},
      "overrideOptions": {}
    },
    {
      "dbName": "test",
      "engineType": "TE",
      "host": "server0",
      "labels": {},
      "overrideOptions": {}
    },
    {
      "dbName": "test",
      "engineType": "TE",
      "host": "server1",
      "labels": {},
      "overrideOptions": {}
    }
  ]
}

This contains the set of request payloads to send to the REST service that the admin tier exposes. By example the following would start a SM process on host server0:

curl -X POST -H "Content-type: application/json" http://localhost:8888/api/1/processes -d '{  
      "archiveId": 0,
      "dbName": "test",
      "engineType": "SM",
      "host": "server0",
      "labels": {},
      "overrideOptions": {}
     }'

NuoDB Command (nuocmd) generates REST requests, and the nuocmd start database subcommand works by requesting a start plan and generating start-process requests for all process specifications in the start plan. So a more concise way of restarting a database is shown below.

Start a database, with TEs on both servers, the show the domain state.

nuocmd start database --db-name test --te-server-ids server0 server1
STARTING: StartProcessRequest(archive_id=0, db_name=test, engine_type=SM, expected_incarnation_major=5, expected_incarnation_minor=0, labels={}, options={}, server_id=server0)
STARTING: StartProcessRequest(archive_id=1, db_name=test, engine_type=SM, expected_incarnation_major=5, expected_incarnation_minor=0, labels={}, options={}, server_id=server1)
STARTING: StartProcessRequest(db_name=test, engine_type=TE, expected_incarnation_major=5, expected_incarnation_minor=0, labels={}, options={}, server_id=server0)
STARTING: StartProcessRequest(db_name=test, engine_type=TE, expected_incarnation_major=5, expected_incarnation_minor=0, labels={}, options={}, server_id=server1)
nuocmd show domain
server version: 3.4.2-1-35a573d0d3, server license: Enterprise
server time: 2019-05-16T19:06:09.497, client token: 63d2999ec79923718ee890b74e873ce33341f5d1
Servers:
  [server0] server0:48005 (FOLLOWER, Leader=server1) ACTIVE:Connected *
  [server1] server1:48005 (LEADER, Leader=server1) ACTIVE:Connected
Databases:
  test [RUNNING]
    [SM] server0:48007 (Default) [start_id = 10] [server_id = server0] MONITORED:RUNNING
    [SM] server1:48007 (Default) [start_id = 11] [server_id = server1] MONITORED:RUNNING
    [TE] server0:48006 (Default) [start_id = 12] [server_id = server0] MONITORED:RUNNING
    [TE] server1:48006 (Default) [start_id = 13] [server_id = server1] MONITORED:RUNNING