Open topic with navigation
When a Transaction Engine (TE) is under memory pressure, NuoDB supports the ability to offload intermediate data to disk when performing memory-intensive SQL operations or when creating indexes on temporary tables. This enables NuoDB to improve performance when processing data that exceeds the memory limits of the TE. In addition, the Storage Manager (SM) may store intermediate data when creating indexes for large non-temporary tables. This is called Spill To Disk (also referred to as Disk Spill).
Note: Currently only some SQL operations support this feature (see Supported Spill to Disk Operations). Future releases will add additional support.
scratch-dir parameter is mandatory on the TE if requiring Spill to Disk functionality. For the SM, the
scratch-dir parameter is optional. If not specified, the archive directory is used. For more information, see Configuring the Spill to Disk Directory.
Index creation is handled differently for temporary and non-temporary tables. Index creation for temporary tables is performed on the TE. If the
scratch-dir parameter is set, the TE will use the Spill to Disk feature for creating or rebuilding indexes for temporary tables. Index creation for non-temporary tables is performed on the SM. The Spill to Disk feature is always enabled for index creation on the SM. For the purpose of performance and of manageability, NuoDB recommends that the
scratch-dir parameter on an SM be set to a disk volume that is different from that of the archive. Since the data stored in this directory is temporary and usually much smaller than the entire archive, local ephemeral disks can be used. If the
scratch-dir parameter is not set for the SM, the archive directory will be used, which may impact performance. In both cases, the intermediate results will be stored in temporary, invisible files.
Note: If writing to the temporary file fails (due to a file system error or a violation of disk space usage due to the value set for
scratch-dir-max-size), index creation or the rebuild operation is aborted.
Note: For more information on using the
scratch-dir-max-size parameters, see Database Options.
To ensure Spill to Disk is enabled for SQL queries across all TEs, ensure that appropriate values are set for the following system properties:
SQL_DISKSPILL- set to
true; when set to
false, Spill to Disk is disabled on all TEs (even after restarting database processes).
DISKSPILL_MEMORY_THRESHOLD_MB- check that the value set for this property is adequate (64MB by default).
For more information on
DISKSPILL_MEMORY_THRESHOLD_MB, see SQL System Properties.
The QUERYBUFFERSTATS table can assist with setting a value for
DISKSPILL_MEMORY_THRESHOLD_MB. For a given workload, this table lists the queries with the highest buffered memory requirements. The amount of memory that is required to process the query entirely in memory is listed in the DATA_BUFFERING_MB column. Only queries that may benefit from SQL Spill to Disk are listed.
If the value set for
DISKSPILL_MEMORY_THRESHOLD_MB is greater than that of the largest DATA_BUFFERING_MB query, Spill to Disk is not used unless the TE is under memory pressure.
Note: Increasing the value set for
DISKSPILL_MEMORY_THRESHOLD_MB may reduce the use of SQL Spill to Disk by a TE and is recommended on condition sufficient system memory is provided for the database to comfortably fit in memory. If a TE comes under memory pressure, Spill to Disk may be triggered despite increasing the threshold. If a TE is under memory pressure, consider reducing the value set for
DISKSPILL_MEMORY_THRESHOLD_MB (below the default value). This could be beneficial when the workload consists of concurrent sorting operations.
The directory used by Spill to Disk is specified using
scratch-dir as a parameter when starting a TE.
NuoDB requires full read and write access to the directory specified. NuoDB recommends that the directory is empty and that it is not used for any other purpose (as its contents may be erased by a NuoDB database). If more than one instance of a NuoDB database is used on the same host, a separate
scratch-dir directory must be used for each instance. For information on using
scratch-dir, see Database Options.
On a TE,
scratch-dir-max-size may be used to specify a limit on the amount of data that can be written to the directory specified. This may be used if there are concerns about filling the disk. If this parameter is not provided, no limit is set. Operations fail if they cause the limit to be exceeded, or if the operating system reports that the disk is full. For information on using
scratch-dir-max-size, see Database Options.
For index creation or rebuilding, the disk space required is proportional to the size of the column(s) on which the index is being constructed, that is key size times the number of rows. For example, where you have a table with 1 million rows and an index for that table which is composed of three concatenated BIGINT columns, then you need (1 million x three x the size of (BIGINT) bytes), which is roughly 24 megabytes. If there are concurrent indexes being built, then all disk space requirements must be added together.
Spill to Disk usage for SQL queries can be evaluated as follows:
|Determine which queries used Spill to Disk||
|Determine the amount of disk usage for a query||
Sorting a large table can require a significant amount of memory. When an
ORDER BY clause is used in a query, NuoDB attempts to sort the data in memory. If the sorting memory usage exceeds the value set for
DISKSPILL_MEMORY_THRESHOLD_MB, Spill to Disk is then used to limit the amount of memory used to complete the sort operation. For information on using
DISKSPILL_MEMORY_THRESHOLD_MB, see SQL System Properties.
The following index-related SQL operations may trigger Spill to Disk usage for indexes:
ALTER TABLE ADD PRIMARY KEY
ALTER TABLE ADD UNIQUE CONSTRAINT
ALTER TABLE REBUILD INDEX
ALTER TABLE REBUILD INDEXES