CREATE STATISTICS

CREATE STATISTICS — compute statistics for specific table column(s)

Syntax

CREATE STATISTICS statistics_name
    ON [schema.]table_name ( column_name [, ...] )
    WITH ( statistics_option [, ...] )

Where statistics_option can be one or more of the following:

FULLSCAN|SAMPLE n {PERCENT|ROWS}
RESOLUTION n
AUTODROP {ON|OFF}

n must be an integer value.

Description

CREATE STATISTICS instructs the system to compute statistics for one or more columns on the specified table_name, without creating an index. The statistics will be computed the next time that ANALYZE is performed on table_name. Statistics can be computed for any table column with the exception of LOB type columns.

CREATE STATISTICS requests the following additional table statistics:

Histograms for any column or column group

Histograms are used to estimate queries that filter by a known literal value. For example, a histogram computed on columns f1, f2, and f3 could be used to estimate this predicate:
```
WHERE f1 = 1 AND f2 = 2 AND f3 > 3 AND f3 < 10
```
Maximum key size for Histograms is 2048 bytes.
Scalar statistics for a column group (Scalar statistics are already calculated for individual columns)

Scalar statistics are used to estimate parameterized queries, join conditions, and grouping. For example, scalar statistics on (f1,f2) may improve the estimates for queries using these constructs:
```
WHERE f1 = ? AND f2 = ?
```
```
GROUP BY f1,f2
```

If an index on the table covers the same columns, additional statistics will not be computed.
When statistics are created using CREATE STATISTICS, ANALYZE STATISTICS runs automatically once. If the statistics are not updated automatically, they must be updated manually by running ANALYZE to regenerate the statistics.

Statistics creation is not allowed for unbounded data types, such as STRING, BYTES, CLOBS, and BLOBS.

Parameters

statistics_name: Name of the statistics to be computed. No schema name can be included here; the statistics are always computed on the schema of the parent table.
table_name: Name (optionally schema-qualified) of the table to be analyzed.
column_name: Name of the column(s) of the table to be analyzed.
FULLSCAN|SAMPLE n {PERCENT|ROWS}: The statistics are computed by sampling the table. The sampling rate can be controlled by specifying either a specific number of rows or a percentage of the total number. FULLSCAN requests that all rows in the table are scanned during the computation of the statistics. FULLSCAN and SAMPLE 100 PERCENT produce the same result.
Default is SAMPLE 10000 ROWS.
RESOLUTION n: This option allows you to specify the RESOLUTION of the histogram, that is, the number of "buckets" used in the histogram. Increasing this number will improve the precision of statistics available to the query optimizer, which will then allow the optimizer to make better decisions, potentially leading to an improvement in query performance. The downside of increasing the number of buckets is that the statistics will consume more memory and require additional compuation.
Valid RESOLUTION values for statistics are 10 - 4096.
Default is RESOLUTION 256.
AUTODROP {ON|OFF}: When ON schema changes are not blocked by the presence of statistics on the table.
Default is OFF.

Examples

The examples below use tables in the HOCKEY schema, which is a schema defined in the test database. The test database is created when the QuickStart process is run. See Running the SQL QuickStart.

Example 1: Creating composite STATISTICS on teams table.

CREATE STATISTICS stats_teams ON hockey.teams(conferenceid, year);

Example 2: Creating STATISTICS with custom sampling.

CREATE STATISTICS stats_teams ON hockey.teams(conferenceid, year) WITH (SAMPLE 20 PERCENT);

Example 3: Creating STATISTICS with full table scan.

CREATE STATISTICS stats_teams ON hockey.teams(conferenceid, year) WITH (FULLSCAN);

Example 4: Creating STATISTICS with custom resolution.

CREATE STATISTICS stats_teams ON hockey.teams(conferenceid, year) WITH (RESOLUTION 512);

Example 5: Creating STATISTICS with all options.

CREATE STATISTICS stats_teams ON hockey.teams(conferenceid, year)
        WITH (SAMPLE 10000 ROWS, RESOLUTION 512, AUTODROP ON);