CREATE INDEX

CREATE INDEX — create a new index

Syntax

CREATE [ UNIQUE ] INDEX index_name
    ON [schema.]table_name(column_name|expression [ASC|DESC] [, ...] )
    [ WITH ( [RESOLUTION n],[MAX KEY SIZE n], [NULLS [NOT] DISTINCT] ) ]
    [USING {BTREE|MERGE TREE}] [ONLINE]

In addition, the syntax supports multiple CREATE INDEX statements concatenated together, separated by a comma (,). See Improving Index Creation Speed below.

Description

CREATE INDEX constructs an index index_name on the specified table_name.
It is not possible to create a redundant index, that is, an index on a column (or expression) or set of columns (or expressions) for which an index already exists.

Running Concurrent Index Builds

If many concurrent long-running index builds are issued, the Storage Manager (SM) could become resource constrained and slow end user application response time. To prevent this from happening, if the internal index build concurrency limit of 4 is reached, a new index build request will be accepted, but processing will be delayed until the number of concurrently running index builds drops below the limit. Index builds on empty tables do not count towards the limit. Commands that can initiate index builds are, CREATE INDEX, CREATE TABLE, ALTER TABLE table REBUILD INDEXES|INDEX, or the adding of a new partition to an existing table. To monitor slow index builds or to view queued index builds, use the nuocmd log-messages --log-options command and log categories info,index. The log message is in the form:
Index build of oid/cid X(txn) stalled for ss seconds waiting for nn concurrent builds to complete.

Short-running index jobs may run longer than expected if they are queued rather than starting right away as in previous releases.

Parameters

UNIQUE: Causes the system to check for duplicate values in the table when the index is created (if data already exist) and each time data is added. Attempts to insert or update data which would result in duplicate entries will generate an error. See Example 2.

Column(s) defined by a UNIQUE INDEX are allowed to contain a NULL value (see NULLS [NOT] DISTINCT).

index_name: Name of the index to be created. No schema name can be included here; the index is always created in the same schema as its parent table.
table_name: Name (optionally schema-qualified) of the table to be indexed
column_name: Name of the column(s) of the table to be indexed
expression: Expression(s) to be indexed. See Expression Based Indexing.
ASC|DESC: The ASC and DESC qualifiers are provided for syntax compatibility purposes only and do not affect the creation of the index. Indexes are always created as if ASC was specified.
WITH ( RESOLUTION n [,…] ): This option allows you to specify the RESOLUTION of the histogram, that is, the number of "buckets" used in the histogram. The default number of buckets created is 256. Increasing this number will improve the precision of statistics available to the query optimizer, which will then allow the optimizer to make better decisions, potentially leading to an improvement in query performance. The downside of increasing the number of buckets is that the index will consume more memory and processing when the index is created or updated. Valid RESOLUTION values for statistics are 10 - 1000 (default 256). See Example 3.
See Improving Query Performance for more information on how to use this option.
WITH ( MAX KEY SIZE n ): The default maximum key size is 1024 bytes. NuoDB recommends that index key size be less than or equal to the default maximum because having larger index keys might decrease performance. However, if your application requires a larger maximum index key size then you can specify it when you create an index. There is no system property for setting a maximum index key size. You can increase the maximum index key size only when you create a particular index.
To change the maximum size allowed for an index key, specify WITH(MAX KEY SIZE n) and replace n with a value from 100 bytes to 3072 bytes. NuoDB throws an exception if you try to specify a maximum key size that is less than 100 bytes or more than 3072 bytes. See Example 4.
WITH ( NULLS DISTINCT | NULLS NOT DISTINCT ): This parameter enables you to indicate whether NULL values should be treated as distinct or not distinct values. In scenarios where there are two records with the same values in the corresponding columns, one or more of which are NULL values, a unique index reports a uniqueness violation when NULLS NOT DISTINCT is specified. If NULL DISTINCT (or neither NULL option) is specified, the two NULL values will be treated as different values and both records will be accepted in the unique index.
ONLINE: Typically, index builds acquire a table lock before populating the index and keep it locked until the index is fully built and the transaction is committed. The table data can be concurrently queried but not updated, deleted, or inserted because the table lock blocks concurrent DML operations. If an index build is executed using CREATE INDEX … ONLINE syntax, the build is done in two transactions. The first transaction creates a new index in the catalog using Incomplete (Incomplete appears before the index name in the output of the SHOW TABLE command). Subsequent DML transactions will update the index, but the read queries ignore it. The second transaction scans the table data and populates the index.
USING (BTREE|MERGE TREE): B Tree is the default indexing used when indexes are created using CREATE INDEX. To create an index using Merge Tree, use CREATE INDEX <index_name> ON <schema_name.table_name> USING MERGE TREE.

The process of populating an existing index is inherently slower than the blocking build, but concurrent DML operations are not blocked while the table is scanned. Each of the two transactions involved in building an index online acquires a table lock for a brief period of time, so the index build cannot finish until all concurrent transactions commit/rollback their updates to the table. The DML transactions blocked by the those table locks are aborted if executed under the CONSISTENT READ transaction isolation level.

Knock-on Effects

If an index build fails, it may leave an incomplete index behind. Such an index does not affect the operation of read queries, but DML operations continue to update it and failure may occur if the constraint checks based on the index are violated, for example unique constraint violation. Incomplete indexes are useless and must be removed manually.
If a concurrent DML violates a constraint such as unique key constraint, either the DML or the index build may fail.

Limitations

Expression-based indexes cannot be built online.
Any changes performed in the transaction prior to invoking CREATE INDEX … ONLINE are committed.

In NuoDB 7.0 and later, redundant indexes can be created on the same column or column group. However, indexes must still have unique names. Allowing redundant indexes on the same column group enables certain index maintenance operations without blocking client activity, such as rebuilding an index online. After creating or rebuilding indexes on a table, check for redundant indexes and delete them if they exist.

Expression-Based Indexing

Expression-based indexes (EBI)s are indexes created using a combination of columns and SQL expressions. Expression-based indexes can speed up queries in which filtering happens, based on expressions rather than columns. Once an expression-based index has been created on a table, eligible queries using that expression can make use of the index, which is then reflected in the EXPLAIN plan of the query.

How Expression-Based Indexing Works in NuoDB

When an index is created on an expression, a hidden computed column is generated which contains the result of the expression. The manner in which the computed column materializes in the database and consumes space on disk, and in memory, is just the same as if it were a column explicitly created by a DBA. The name of the computed column is automatically generated (for example $NUO_SYSFIELD_1) and appears in the FIELDS table (see FIELDS System Table Description).

It is possible to select the value stored in the computed column by using the column name. This can be useful for debugging purposes. However, applications should not rely on directly querying the computed column. NuoDB internally rewrites the expressions to this computed column in cases where the computed column is not directly referenced.

When an expression-based index is created on a table that already has data, an update is performed which populates the computed column with the correct values. This has several consequences:

Creating an expression-based index on a table requires additional UPDATE privileges on the table.
Creating an expression-based index on a table with data takes longer than creating a regular column-based index on the same table.
Creating an expression-based index can fail (in rare situations, and return an update conflict error) if the isolation level of the transaction is CONSISTENT READ, and there are concurrent writes to the table from other transactions.

The on-disk storage size of the table temporarily doubles when an EBI is added, as during the process there are two copies of the table, old and new. Furthermore, the on-disk storage size will permanently increase for the index added. The size increase is dependent on the width of the key (that is, the Expression). Therefore, confirm there’s sufficient free disk space before adding an EBI. The old copy of the table will eventually be Garbage Collected, marking the space as available for re-use.

Using Expression-Based Indexes in NuoDB

User-defined functions can be used in expression-based indexes.

If there is an attempt to drop or modify the function, it completes without error, but results in the index not being recreated. Such action leads to undefined behavior and is not recommended.

If there are two expression-based indexes on a table, and one is a sub-expression of the other, the index that is chosen during a query is not guaranteed to be the best one. See Example 7: Create more than one index.

Limitations of Expression-Based Indexes in NuoDB

The following limitations apply when using expression-based indexes:

An expression-based index cannot be created where:
- The table in question is a temporary table.
- The expression references another table.
- The expression contains a subquery.
- The expression is not deterministic:
  - All built-in functions must be deterministic, that is, no RAND(), no NOW() and so on. This includes date and time functions and operators whose result may depend on the time zone.
  - All user-defined functions must be declared as DETERMINISTIC.
- The expression uses aggregate/window functions.
- The expression does not reference any table columns.
- It is not possible to determine the precision or scale of the result of an expression, for example an arithmetic operation on a STRING column, without an explicit CAST function.
A unique constraint or primary key cannot be created on an expression.
When the expression is a query condition, for example a = 1, the index is not used if the condition is the only condition in the WHERE clause. To use an index in this case, the where clause should be rewritten to (a = 1) = true.
If two indexes exist on the same table, where one is created on a sub-expression of the other (for example a + b and a + b + c), the result of using both expressions in a query is non-deterministic, that is, either index could be used, or neither, or both.
User-defined functions can be used in expression-based indexes, but NuoDB does not track this dependency. The user-defined function can be changed or dropped. If changed, the result is that the data in the index is invalid and the index has to be manually rebuilt. The result of dropping the UDF is that INSERT and UPDATE operations on the target table fail with an error message indicating that the user-defined function cannot be found.

Improving Index Creation Speed

Creating multiple indexes on the same table involves a table scan for each index created. Concatenating these CREATE INDEX statements, separated each by a comma (,) indicates to NuoDB that only one table scan is required and performance time for the creation of the index will be improved. For example, where one might run the following commands:

CREATE INDEX idxa ON tab(a);
CREATE INDEX idxb ON tab(b);
CREATE INDEX idxc ON tab(c);

Running the statements below will result in faster index creation:

CREATE INDEX idxa ON tab(a),
CREATE INDEX idxb ON tab(b),
CREATE INDEX idxc ON tab(c);

Note the comma instead of semicolon. Concatenation of CREATE INDEX statements in this manner is only supported for one table at a time.

Examples

The examples below make use of tables in the HOCKEY schema, which is a schema defined in the test database. The test database is created when the QuickStart process is run. See Running the SQL QuickStart.

Example 1: Creating composite INDEX on teams table.

CREATE INDEX idx_teams_confyr ON hockey.teams(conferenceid, year);

Example 2: Creating UNIQUE INDEX on hockey table.

CREATE UNIQUE INDEX idx_hockey_number ON hockey.hockey(number);

Example 3: Creating INDEX with greater RESOLUTION.: Create a table, test_scoring, from the hockey.scoring table. reate a composite index on the table with the default RESOLUTION value of 10. Select the histogram buckets defined for the index from the system index statistic tables. Change the index RESOLUTION value to 300. Select the histogram buckets defined for the index from the index statistic tables. The index key values are now distributed across 300 buckets.

USE TEST;
DROP TABLE IF EXISTS test_scoring;
CREATE TABLE test_scoring
( PLAYERID STRING
, YEAR INTEGER
, STINT INTEGER
, TEAMID STRING
, POSITION STRING);

CREATE INDEX idx_testscoring_composite
      ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION);

INSERT INTO test_scoring SELECT PLAYERID,YEAR,STINT,TEAMID,POSITION FROM hockey.scoring;
ANALYZE TABLE test_scoring;

SELECT ih.histogramid,ih.fieldcount,ih.maxresolution
      ,ihb.bucketindex,ihb.boundary
FROM system.indexstatistics ixs
   , system.indexhistograms ih
   , system.indexhistogrambuckets ihb
WHERE ixs.schema = 'test'
  AND ixs.tablename = 'test_scoring'
  AND ixs.indexname = 'idx_testscoring_composite'
  AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid
  AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid
 ORDER BY ih.histogramid,ihb.bucketindex;

 HISTOGRAMID  FIELDCOUNT  MAXRESOLUTION  BUCKETINDEX                BOUNDARY
 ------------ ----------- -------------- ------------ ------------------------------------

      5            5            10             0      ZERO LENGTH KEY
      5            5            10             1      [broderi01], [1974], [1], [QUN], [G]
      5            5            10             2      [delmoan01], [1999], [1], [PHI], [D]
      5            5            10             3      [gigueje01], [2009], [2], [TOR], [G]
      5            5            10             4      [humliv01], [2001], [1], [BOS], [L]
      5            5            10             5      [larosgu01], [1993], [1], [TOR], [C]
      5            5            10             6      [mckenji02], [1993], [2], [DAL], [L]
      5            5            10             7      [patrigl01], [1974], [1], [CLF], [D]
      5            5            10             8      [sabouke01], [1990], [1], [CAL], [D]
      5            5            10             9      [talboje01], [1968], [1], [STL], [D]
      5            5            10            10      [zyuzian01], [2007], [1], [CHI], [D]

DROP INDEX idx_testscoring_composite;
CREATE INDEX idx_testscoring_composite
        ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION)
      WITH (RESOLUTION 300);

ANALYZE TABLE test_scoring;

SELECT ih.histogramid,ih.fieldcount,ih.maxresolution
      ,ihb.bucketindex,ihb.boundary
FROM system.indexstatistics ixs
   , system.indexhistograms ih
   , system.indexhistogrambuckets ihb
WHERE ixs.schema = 'test'
  AND ixs.tablename = 'test_scoring'
  AND ixs.indexname = 'idx_testscoring_composite'
  AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid
  AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid
 ORDER BY ih.histogramid,ihb.bucketindex;

 HISTOGRAMID  FIELDCOUNT  MAXRESOLUTION  BUCKETINDEX                 BOUNDARY
 ------------ ----------- -------------- ------------ --------------------------------------

      5            5           300             0      ZERO LENGTH KEY
      5            5           300             1      [adamske01], [2003], [1], [CAR], [C]
      5            5           300             2      [allenbo01], [2002], [1], [EDM], [D]
      5            5           300             3      [andergl01], [1989], [1], [EDM], [R]
      5            5           300             4      [andrepa01], [1968], [1], [PIT], [R]
      5            5           300             5      [archami01], [1976], [1], [CHI], [L]
      5            5           300             6      [arvedma01], [1997], [1], [OTT], [L]
      5            5           300             7      [aucoiad01], [2011], [1], [PHO], [D]
      5            5           300             8      [babycwa01], [1982], [1], [STL], [R]
      5            5           300             9      [bakovpe01], [1987], [1], [VAN], [R]
      5            5           300            10      [barkldo01], [1964], [1], [DET], [D]
      5            5           300            11      [barrile01], [1989], [1], [PHI], [C]
      5            5           300            12      [bawaro01], [1992], [1], [SJS], [R]
      5            5           300            13      [beaurst01], [1992], [1], [PHI], [G]
      5            5           300            14      [belfoed01], [1998], [1], [DAL], [G]
      5            5           300            15      [benneha02], [1976], [2], [PHI], [C]
...
      5            5           300           289      [weinrer01], [1999], [1], [MTL], [D]
      5            5           300           290      [westcdu01], [2002], [1], [CBS], [D]
      5            5           300           291      [whiteto02], [1998], [1], [CHI], [C]
      5            5           300           292      [wilcoba01], [1972], [1], [VAN], [R]
      5            5           300           293      [willsbr01], [1999], [1], [COL], [R]
      5            5           300           294      [wilsomi02], [1998], [1], [BUF], [D]
      5            5           300           295      [wojcist01], [1946], [1], [DET], [R]
      5            5           300           296      [woywije01], [2007], [1], [STL], [D]
      5            5           300           297      [yellest01], [2000], [1], [COL], [C]
      5            5           300           298      [ysebapa01], [1995], [1], [TBL], [L]
      5            5           300           299      [zeniued01], [1954], [1], [DET], [D]
      5            5           300           300      [zyuzian01], [2007], [1], [CHI], [D]

Example 4: Creating INDEX with a key size greater than the MAX KEY SIZE: Create a table with one column. Insert into the table a row with a column value that has a length greater than the default maximum KEY SIZE length of 1024. Creating an index on this column will abort due to KEY SIZE greater than MAX KEY SIZE of 1024. The index can be created by increasing the KEY SIZE of the index using the MAX KEY SIZE option of the CREATE INDEX command.

USE TEST;
DROP PROCEDURE IF EXISTS prc_load_testtab;
DROP TABLE IF EXISTS testtab;

CREATE TABLE testtab (col1 STRING);

SET DELIMITER @
CREATE PROCEDURE prc_load_testtab
AS
    VAR l_col_length INT = 0;
    VAR l_col_val STRING = '1';
    WHILE ( l_col_length < 1024 )
        l_col_val=l_col_val||'1234567890';
        l_col_length = length(l_col_val);
    END_WHILE;
    INSERT INTO testtab VALUES (l_col_val);
END_PROCEDURE;
@
SET DELIMITER ;

CALL prc_load_testtab;

SELECT LENGTH(col1) AS length_of_col1 FROM testtab;

 LENGTH_OF_COL1
 ---------------

      1031

CREATE INDEX ixd_testtab_col1 ON testtab(col1);
Key length exceeds allowed maximum of 1024 bytes in index IXD_TESTTAB_COL1

CREATE INDEX ixd_testtab_col1 ON testtab(col1 ) WITH (MAX KEY SIZE 1033);

SHOW TABLE testtab;
    Tables named TESTTAB

    Found table TESTTAB in schema TEST

        Fields:
            COL1 string
        Secondary Index: IXD_TESTTAB_COL1 on field: COL1

Example 5: Creating a simple index on the sum of two columns: Create an index, when the table is created, using KEY.

CREATE TABLE t1(a INTEGER, b INTEGER, c STRING, KEY i1(a + b));

SELECT c FROM t1 WHERE a + b = 42;

Example 6: Creating an index on the result of a user-defined function

CREATE TABLE t2(a INTEGER, b STRING);
CREATE FUNCTION f1(s STRING) RETURNS INTEGER DETERMINISTIC AS IF (CHAR_LENGTH(s) > 100) RETURN CHAR_LENGTH(s); ELSE RETURN 2*CHAR_LENGTH(s); END_IF; END_FUNCTION;
CREATE INDEX i2 ON t2(f1(b));

SELECT a, b FROM t2, WHERE f1(b) = 50;

Example 7: Creating more than one index: This example shows how to create two indexes, one containing the other. As stated in Using Expression-based Indexes in NuoDB, it there is no guarantee that the best index will be selected. For example, there is no certainty on whether the best index out of i3 and i4 will be selected.

CREATE TABLE t3(a INTEGER, b INTEGER, c INTEGER);
CREATE INDEX i3 on t3(a + b);
CREATE INDEX i4 on t3(a + b + c);

SELECT * from t3 where a + b + c = 100;