NuoDB System Architecture

NuoDB is a distributed, relational database management system, a category sometimes referred to as NewSQL databases.

A microservice-style architecture divides processing into three layers:

  • Administration

  • SQL Processing

  • Storage Management

System Overview
Figure 1. System Overview

Administration

The administration layer consists of Admin Processes or APs. The AP is a Java application called nuoadmin. One a bare-metal or VM based implementation, typically every node has an AP running on it. In a containerized implementation, such as under Kubernetes, far fewer APs are typically deployed.

The APs also provide a connection load-balancing service to clients - see below

SQL Processing

Transaction Engines or TEs run in this layer and perform the parsing, optimization and execution of queries. They fetch data (atoms) as required, either from another TE or from an SM. All changes are passed to the SMs to be committed to disk. Once an atom has been loaded by an TE it is stored in its cache for faster access next time. Cache management is discussed in a dedicated section.

This layer is where atomicity, consistency and isolation are implemented. NuoDB offers both SQL compliance and ACID transactions even though the database is distributed. This is in marked contrast to NoSQL databases which typically have their own, custom "SQL-like" query languages and usually do not offer ACID transactions.

Multiple TEs are recommended both for resilience (redundancy) and for throughput (to handle the SQL load on the database).

Storage management

Storage Managers or SMs are responsible for persistence (durability). Data is stored to files (ideally on a local disk) and reread as necessary. The directory on disk and all its contents is termed an archive. Only TEs communicate with SMs either to request data or to have data stored. A redundant set of SMs (typically 3 or 5) is recommended.

By default, every SM manages a complete copy of the entire database, each is effectively a backup for all the others. When a TE needs changes to be committed, it sends the data to all the running SMs and waits for them all to acknowledge the commit before acknowledging the commit to its client. In terms of the CAP Theorem, NuoDB prioritizes correctness over availability. Therefore a client may have to wait whilst a data item (atom) is being updated, however all running SMs are always up to date. For performance and resilience, like most database systems, the SM first writes all changes to a Journal, periodically reading the journal and writing the changes to its disk.

For a form of Eventual Consistency (data is always available, but may be slightly out of date) use Asynchronous SMs.

Engines

Because both TEs and SMs share common-code, in particular atom caching, they are both implemented by the same C++ application called nuodb. When nuodb starts, a run-time option tells it to run as either a TE or an SM. TE and SM processes are collectively termed engines.

Domain

The set of all APs is called a Domain. A domain exists provided that a majority (over 50%) of APs are running, even if no databases have been defined or are running. One or more databases run in the context of any given domain. Databases are unique to their domain and the TEs and SMs for a database are unique to that database. TEs and SMs are never shared by more than one database.

NuoDB provides a command-line utility, nuocmd, for domain and database management. The utility communicates with APs via a REST-style interface and the APs in turn carry out the requests. nuocmd can be used to create archives and databases, start and stop database processes, fetch runtime and debugging information, and much, much more. The REST API is documented and can be used by your own scripts or applications.

Client Connections

Processes may come and go in a distributed environment, for example as a result of scaling in or out, so to avoid clients having to keep track of the processes in the database, clients always connect via an AP (any AP, it does not matter which). All the APs have complete knowledge of all processes in the Domain and can return the address of a suitable TE for the client to talk to. Clients only ever communicate with TEs. They never talk to SMs.