Transaction Engine (TE) or Storage Manager (SM) Quits Unexpectedly

Symptom

A TE or an SM quits unexpectedly with no core file.

Cause

This behavior is most often seen on Linux and is caused by the Linux out of memory killer. The OOM killer will begin killing processes when memory gets low and will often kill the process that is using the most memory. It is often the case that NuoDB is the process that is consuming the most memory on the server and is therefore killed in these instances.

If this issue occurs it is typically because there are additional, memory intensive applications running on the same host as the TE or SM.

Solution

A process can be immunized against the OOM killer if the value of its /proc/$PID/oom_adj is set to the constant OOM_DISABLE (currently defined as -17). This setting is not durable and it is recommended that this be enforced automatically via a cron job. The following provides an example:

#/etc/cron.d/oom_disable
*/1 * * * * root pgrep -f "/opt/nuodb/bin/nuodb" | while read PID;
   do echo -17 > /proc/$PID/oom_adj; done

The command above immunizes both TEs and SMs.