Scaling Down the Admin Layer without Kubernetes-Aware Admin

In section Admin Scale-down, the process for scaling down the Admin layer using Kubernetes-Aware Admin (KAA) is described. To summarize that section, KAA enables easily scaling down of the Admin StatefulSet without the risk of losing Admin quorum.

If KAA is not available, either because the version of NuoDB being used does not support it, or because the role and role-binding needed by KAA and automatically installed by the NuoDB Helm Charts cannot be used, then scaling down of the Admin layer may require manual recovery from lost Admin quorum as described in Re-establishing Admin Process (AP) Quorum. The Admin scale-down functionality of KAA is also not available in multi-cluster Kubernetes deployments of NuoDB Helm Charts.

To demonstrate the situation, the NuoDB Admin Helm Chart is installed in the example below with nuodb.addRoleBinding=false to disable KAA. Then the loss of Admin quorum is simulated by scaling a domain from three to one.

It is not recommended to disable KAA unless your organization’s security policy disallows read access to Kubernetes state from within Kubernetes Pods.
helm install demo nuodb/admin --set nuodb.addRoleBinding=false --set admin.replicas=3

After installing the NuoDB Admin Helm Chart, eventually nuocmd show domain will show three Connected Admin servers:

$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd show domain
server version: 4.0.7-2-6526a2db74, server license: Community
server time: 2020-12-09T20:53:18.855, client token: b831a5b4f5f8de5e303dceeb2704a7ea541f4e3d
Servers:
  [demo-nuodb-cluster0-admin-0] demo-nuodb-cluster0-admin-0.nuodb.default.svc.cluster.local:48005 [last_ack = 0.89] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Connected *
  [demo-nuodb-cluster0-admin-1] demo-nuodb-cluster0-admin-1.nuodb.default.svc.cluster.local:48005 [last_ack = 0.89] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Connected
  [demo-nuodb-cluster0-admin-2] demo-nuodb-cluster0-admin-2.nuodb.default.svc.cluster.local:48005 [last_ack = 0.89] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Connected
Databases:

Now suppose that the Admins with server IDs demo-nuodb-cluster0-admin-1 and demo-nuodb-cluster0-admin-2 are lost permanently, or we simply wish to scale down the domain to a single Admin. In that case, we would scale the Admin StatefulSet down to 1 so that Kubernetes stops attempting to schedule them. Eventually nuocmd show domain with show the Admins with ordinal 1 and 2 as Disconnected:

$ kubectl scale statefulset demo-nuodb-cluster0-admin --replicas=1
statefulset.apps/demo-nuodb-cluster0-admin scaled

...

$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd show domain
server version: 4.0.7-2-6526a2db74, server license: Community
server time: 2020-12-09T20:55:09.085, client token: 3a4f9240093c6105f61daebe7d1b3cb3bfc94b54
Servers:
  [demo-nuodb-cluster0-admin-0] demo-nuodb-cluster0-admin-0.nuodb.default.svc.cluster.local:48005 [last_ack = 1.03] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Connected *
  [demo-nuodb-cluster0-admin-1] demo-nuodb-cluster0-admin-1.nuodb.default.svc.cluster.local:48005 [last_ack = 5.04] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Disconnected
  [demo-nuodb-cluster0-admin-2] demo-nuodb-cluster0-admin-2.nuodb.default.svc.cluster.local:48005 [last_ack = 5.04] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=demo-nuodb-cluster0-admin-0, log=0/6/6) Disconnected
Databases:

At this point, we do not have Admin quorum, because we only have 1 out of 3 servers in the domain membership. Any command to change to the domain state requires quorum and would fail in this state. For example, the command to remove demo-nuodb-cluster0-admin-1 from the domain membership would fail with a timeout due to lack of quorum:

$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd delete server --server-id demo-nuodb-cluster0-admin-1
'delete server' failed: Unable to remove server from membership: Unable to get command response: Command request timed out

In order to allow changes to domain state and domain membership in this state, the Admin StatefulSet needs to be edited to supply the list of Admin server IDs that must be explicitly excluded from quorum. This can be done by executing the following command:

kubectl edit statefulsets.apps demo-nuodb-cluster0-admin

In the editor, locate the args entry for the container named admin under spec.template.spec.containers:

...
spec:
  podManagementPolicy: Parallel
  replicas: 1
  ...
  template:
    ...
    spec:
      containers:
      - args:
        - nuoadmin
        - --
        - pendingReconnectTimeout=60000
        - processLivenessCheckSec=30
        env:
          ...
        name: admin
        ...

To exclude demo-nuodb-cluster0-admin-1 and demo-nuodb-cluster0-admin-2 from quorum, add the arguments --evicted-servers and demo-nuodb-cluster0-admin-1,demo-nuodb-cluster0-admin-2 to the list of command-line arguments after nuoadmin and before the -- argument, to obtain the following:

...
spec:
  podManagementPolicy: Parallel
  replicas: 1
  ...
  template:
    ...
    spec:
      containers:
      - args:
        - nuoadmin
        - --evicted-servers
        - demo-nuodb-cluster0-admin-1,demo-nuodb-cluster0-admin-2
        - --
        - pendingReconnectTimeout=60000
        - processLivenessCheckSec=30
        env:
          ...
        name: admin
        ...

Once demo-nuodb-cluster0-admin-0 has been restarted with --evicted-servers, demo-nuodb-cluster0-admin-1 and demo-nuodb-cluster0-admin-2 can be removed from the domain membership.

--evicted-servers does not cause the specified Admin servers to be removed from the domain membership, but only causes them to be excluded from quorum.
$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd delete server --server-id demo-nuodb-cluster0-admin-1
$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd delete server --server-id demo-nuodb-cluster0-admin-2
$ kubectl exec demo-nuodb-cluster0-admin-0 -- nuocmd show domain
server version: 4.0.7-2-6526a2db74, server license: Community
server time: 2020-12-09T20:58:10.555, client token: ccddf60330d38c5947811e7cadb9487ac2451e41
Servers:
  [demo-nuodb-cluster0-admin-0] demo-nuodb-cluster0-admin-0.nuodb.default.svc.cluster.local:48005 [last_ack = 0.56] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=demo-nuodb-cluster0-admin-0, log=1/14/14) Connected *
Databases:

Finally, be sure to return the Admin StatefulSet to original state by removing the arguments --evicted-servers and demo-nuodb-cluster0-admin-1,demo-nuodb-cluster0-admin-2. It is no longer necessary to exclude them from quorum since they are not in the domain membership at all.

...
spec:
  podManagementPolicy: Parallel
  replicas: 1
  ...
  template:
    ...
    spec:
      containers:
      - args:
        - nuoadmin
        - --
        - pendingReconnectTimeout=60000
        - processLivenessCheckSec=30
        env:
          ...
        name: admin
        ...