Data Caching

Whenever a Transaction Engine (TE) or a Storage Manager (SM) loads an atom, it caches it for future use. The typical scenario is as follows.

The first time data is requested (typically a SELECT against a table), the necessary rows are requested by a TE from one of the SMs. The atoms corresponding to those rows are fetched from disk, sent across the network from the SM to the TE and the TE caches them in memory. Subsequent queries against the same table can reuse the cached data, avoiding network and disk overhead.

If another TE requires the same data, it can fetch it from either the first TE or an SM. Fetching from a TE is preferred since the SM may have to read it from the disk. Internal metadata (in the Catalog atoms) keeps track of which TEs and SMs have a copy of any given piece of data. Each TE also tracks how fast the other processes (both TEs and SMs) in the database respond - see ping-timeout property. In this way, the TE attempts to fetch data from the fastest source.

When a cached data item is changed by another TE (that also has it cached), the same metadata ensures that every TE is sent a copy of the change. Copies of the data in other TEs may be locked to prevent concurrent updates (it’s not quite that simple due to NuoDB’s isolation modes). The SMs commit the change (see Commit Protocol) and then every TE with the data cached, updates its copy also.

A Least Recently Used (LRU) algorithm is used to control the cache when it starts to get full. A garbage collector runs to remove the oldest atoms to create space. Atom fetching and removal are quantified by the Objects…​ metrics and on Insights dashboards.