Spill to Disk

When a Transaction Engine (TE) is under memory pressure, NuoDB supports the ability to offload intermediate data to disk when performing memory-intensive SQL operations or when creating indexes on temporary tables. This enables NuoDB to improve performance when processing data that exceeds the memory limits of the TE. In addition, the Storage Manager (SM) may store intermediate data when creating indexes for large non-temporary tables. This is called Spill To Disk (also referred to as Disk Spill).

Note: Currently only some SQL operations support this feature (see Supported Spill to Disk Operations). Future releases will add additional support.

Enabling Spill to Disk

Note: The scratch-dir parameter is mandatory on the TE if requiring Spill to Disk functionality. For the SM, the scratch-dir parameter is optional. If not specified, the archive directory is used. For more information, see Configuring the Spill to Disk Directory.

Spill to Disk for Index Creation

Index creation is handled differently for temporary and non-temporary tables. Index creation for temporary tables is performed on the TE. If the scratch-dir parameter is set, the TE will use the Spill to Disk feature for creating or rebuilding indexes for temporary tables. Index creation for non-temporary tables is performed on the SM. The Spill to Disk feature is always enabled for index creation on the SM. For the purpose of performance and of manageability, NuoDB recommends that the scratch-dir parameter on an SM be set to a disk volume that is different from that of the archive. Since the data stored in this directory is temporary and usually much smaller than the entire archive, local ephemeral disks can be used. If the scratch-dir parameter is not set for the SM, the archive directory will be used, which may impact performance. In both cases, the intermediate results will be stored in temporary, invisible files.

Note: If writing to the temporary file fails (due to a file system error or a violation of disk space usage due to the value set for scratch-dir-max-size), index creation or the rebuild operation is aborted.

Note: For more information on using the scratch-dir and scratch-dir-max-size parameters, see Database Options.

Enabling Spill to Disk for SQL Queries

To ensure Spill to Disk is enabled for SQL queries across all TEs, ensure that appropriate values are set for the following system properties:

For more information on SQL_DISKSPILL and DISKSPILL_MEMORY_THRESHOLD_MB, see SQL System Properties.

The QUERYBUFFERSTATS table can assist with setting a value for DISKSPILL_MEMORY_THRESHOLD_MB. For a given workload, this table lists the queries with the highest buffered memory requirements. The amount of memory that is required to process the query entirely in memory is listed in the DATA_BUFFERING_MB column. Only queries that may benefit from SQL Spill to Disk are listed.

If the value set for DISKSPILL_MEMORY_THRESHOLD_MB is greater than that of the largest DATA_BUFFERING_MB query, Spill to Disk is not used unless the TE is under memory pressure.

Note: Increasing the value set for DISKSPILL_MEMORY_THRESHOLD_MB may reduce the use of SQL Spill to Disk by a TE and is recommended on condition sufficient system memory is provided for the database to comfortably fit in memory. If a TE comes under memory pressure, Spill to Disk may be triggered despite increasing the threshold. If a TE is under memory pressure, consider reducing the value set for DISKSPILL_MEMORY_THRESHOLD_MB (below the default value). This could be beneficial when the workload consists of concurrent sorting operations.

Configuring the Spill to Disk Directory

The directory used by Spill to Disk is specified using scratch-dir as a parameter when starting a TE.

NuoDB requires full read and write access to the directory specified. NuoDB recommends that the directory is empty and that it is not used for any other purpose (as its contents may be erased by a NuoDB database). If more than one instance of a NuoDB database is used on the same host, a separate scratch-dir directory must be used for each instance. For information on using scratch-dir, see Database Options.

On a TE, scratch-dir-max-size may be used to specify a limit on the amount of data that can be written to the directory specified. This may be used if there are concerns about filling the disk. If this parameter is not provided, no limit is set. Operations fail if they cause the limit to be exceeded, or if the operating system reports that the disk is full. For information on using scratch-dir-max-size, see Database Options.

Reviewing Disk Space Required

For index creation or rebuilding, the disk space required is proportional to the size of the column(s) on which the index is being constructed, that is key size times the number of rows. For example, where you have a table with 1 million rows and an index for that table which is composed of three concatenated BIGINT columns, then you need (1 million x three x the size of (BIGINT) bytes), which is roughly 24 megabytes. If there are concurrent indexes being built, then all disk space requirements must be added together.

Spill to Disk usage for SQL queries can be evaluated as follows:

Usage Evaluation Actions
Determine which queries used Spill to Disk
  • Review the DISKSPILL column in the LASTSTATEMENT table. For more information, see LASTSTATEMENT System Table Description.
  • Review the setting for DiskSpill in the sql-statement-metrics log.
Determine the amount of disk usage for a query

Supported Spill to Disk Operations

Sorting

Sorting a large table can require a significant amount of memory. When an ORDER BY clause is used in a query, NuoDB attempts to sort the data in memory. If the sorting memory usage exceeds the value set for DISKSPILL_MEMORY_THRESHOLD_MB, Spill to Disk is then used to limit the amount of memory used to complete the sort operation. For information on using DISKSPILL_MEMORY_THRESHOLD_MB, see SQL System Properties.

Indexes

The following index-related SQL operations may trigger Spill to Disk usage for indexes: