Troubleshooting

If you require assistance from NuoDB Support, it may be necessary to provide diagnostic information. To collect diagnostic information related to the NuoDB database in your OpenShift deployment, and generate a support package, there are several commands you may execute.

Collecting Diagnostics

Using the nuocmd command line interface, the following commands are used to extract information from an OpenShift cluster:

You may run these commands from the Terminal tab of a NuoDB process pod Terminal tab in the OpenShift UI. To navigate to the Terminal tab, click the Applications tab, then click Pods and select a pod on which commands are to be executed.

Note: Before executing commands, type bash in the terminal command window and press the enter key; this enables the auto-completion of commands, helping you confirm that commands are correct before executing them.

get diagnose-info

This command collects database process core files and save a zip file in the specified location.

Arguments

Example

nuocmd get diagnose-info --output-dir /tmp/nuodiag --include-cores

Note: The --output-dir parameter is optional.

get server-logs

This command creates a zip file containing the Admin Service log files in the location specified.

Arguments

Example

nuocmd get server-logs --output /tmp

get log-messages

This command displays all log messages.

Arguments

Example

nuocmd get log-messages --log-options msgs 

Note: The --log-options parameter is required. For a complete list of log parameters, see Description of Logging Categories.

get core-file

This command downloads the core file for a running database process.

Arguments

Example

nuocmd get core-file --start-id 1 --output-dir /tmp/

Note: To get the value for the --start-id parameter, run the nuocmd show domain command and then note the SID value displayed for the SM or TE process for which you want to generate a core file. If this parameter is not specified, permission to write to the directory where you executed the command is required.

get database-connectivity

Based on the database specified, this command provides connectivity information in JSON format, which can be read and processed for further analysis.

Arguments

Example

nuocmd get database-connectivity --db-name test
{
  "0": {
    "1": {
      "lastAckDeltaInMilliSeconds": 746,
      "lastMsgDeltaInMilliSeconds": 755
    },
    "2": {
      "lastAckDeltaInMilliSeconds": 460,
      "lastMsgDeltaInMilliSeconds": 755
    },
    "3": {
      "lastAckDeltaInMilliSeconds": 860,
      "lastMsgDeltaInMilliSeconds": 755
    }
  },
  "1": {
    "0": {
	"lastAckDeltaInMilliSeconds": 802,
	"lastMsgDeltaInMilliSeconds": 794
    },
    "2": {
      "lastAckDeltaInMilliSeconds": 508,
      "lastMsgDeltaInMilliSeconds": 794
    },
    "3": {
      "lastAckDeltaInMilliSeconds": 907,
      "lastMsgDeltaInMilliSeconds": 794
    }
  },
.
.
and so on ...

show database-connectivity

Based on the database specified, this command provides connectivity information in tabulated format.

Arguments

Example

nuocmd show database-connectivity --db-name test --with-node-ids
     1    2    3    4 
1             27s      
2             27s      
3    ?    ?    ?    ? 
4             32s 
           
Legend:
X: node at this row does not consider node at this column a peer
?: node at this row could not be queried for connectivity information
!: node at this row does not have expected metadata for node at this column
[0-9]+[hms]: time since node at this row last heard from node at this column

Mounting Failure

If a mount failure has occurred, the following message displays:

Unable to mount volumes for pod "demo-east-0_guitar(<uuid>)": timeout expired waiting for volumes to attach/mount for pod "<project>"/"demo-east-0". list of unattached/unmounted volumes=[raftlog default-token-gjjgd]

Resolution

Delete the associated Storage Manager (SM) pod or Admin pod and it will automatically restart.

Cluster Server Node Failure

If a server node in a cluster dies for any reason, for example due to a hardware failure, then any Admin Service or Storage Manager (SM) process on that node does not get restarted and continues to hold resources such as Persistent Volume Claims (PVCs). However, any Transaction Engine (TE) on a failing node is restarted on any surviving compute node that adheres to the TE's node affinity rules, for example the node must meet certain CPU and memory requirements to host the new TE.

Behavior

The status of a failed node reported by the oc get nodes command is:

Not Ready.

The status of an Admin or SM pod, in the OpenShift UI, that were running on the failed node is:

Terminating.

Resolution

The server node and pods remain in these states and PVCs remain until the node either comes back or is manually deleted from the cluster. The 'terminating' pods and held PVCs can be manually removed by executing the oc delete command on the correct resource.