CREATE INDEX
CREATE INDEX
— create a new index
Syntax
CREATE [ UNIQUE ] INDEX index_name
ON [schema.]table_name(column_name|expression [ASC|DESC] [, ...] )
[ WITH ( [RESOLUTION n],[MAX KEY SIZE n], [NULLS [NOT] DISTINCT] ) ] [ONLINE]
[USING {BTREE|MERGE TREE}]
In addition, the syntax supports multiple CREATE INDEX
statements concatenated together, separated by a comma (,). See Improving Index Creation Speed below.
Description
CREATE INDEX
constructs an index index_name
on the specified table_name
.
It is not possible to create a redundant index, that is, an index on a column (or expression) or set of columns (or expressions) for which an index already exists.
Running Concurrent Index Builds
If many concurrent long-running index builds are issued, the Storage Manager (SM) could become resource constrained and slow end user application response time.
To prevent this from happening, if the internal index build concurrency limit of 4
is reached, a new index build request will be accepted, but processing will be delayed until the number of concurrently running index builds drops below the limit.
Index builds on empty tables do not count towards the limit.
Commands that can initiate index builds are, CREATE INDEX
, CREATE TABLE
, ALTER TABLE table REBUILD INDEXES|INDEX
, or the adding of a new partition to an existing table.
To monitor slow index builds or to view queued index builds, use the nuocmd log-messages --log-options
command and log categories info,index
.
The log message is in the form:
Index build of oid/cid X(txn) stalled for ss seconds waiting for nn concurrent builds to complete.
Short-running index jobs may run longer than expected if they are queued rather than starting right away as in previous releases. |
Parameters
UNIQUE
-
Causes the system to check for duplicate values in the table when the index is created (if data already exist) and each time data is added. Attempts to insert or update data which would result in duplicate entries will generate an error. See Example 2.
Column(s) defined by a UNIQUE INDEX
are allowed to contain a NULL
value (see NULLS [NOT] DISTINCT
).
index_name
-
Name of the index to be created. No schema name can be included here; the index is always created in the same schema as its parent table.
table_name
-
Name (optionally schema-qualified) of the table to be indexed
column_name
-
Name of the column(s) of the table to be indexed
expression
-
Expression(s) to be indexed. See Expression Based Indexing.
ASC|DESC
-
The ASC and DESC qualifiers are provided for syntax compatibility purposes only and do not affect the creation of the index. Indexes are always created as if ASC was specified.
WITH ( RESOLUTION n [,…] )
-
This option allows you to specify the
RESOLUTION
of the histogram, that is, the number of "buckets" used in the histogram. The default number of buckets created is 256. Increasing this number will improve the precision of statistics available to the query optimizer, which will then allow the optimizer to make better decisions, potentially leading to an improvement in query performance. The downside of increasing the number of buckets is that the index will consume more memory and processing when the index is created or updated. ValidRESOLUTION
values for statistics are 10 - 1000 (default 256). See Example 3.
See Improving Query Performance for more information on how to use this option. WITH ( MAX KEY SIZE n )
-
The default maximum key size is 1024 bytes. NuoDB recommends that index key size be less than or equal to the default maximum because having larger index keys might decrease performance. However, if your application requires a larger maximum index key size then you can specify it when you create an index. There is no system property for setting a maximum index key size. You can increase the maximum index key size only when you create a particular index.
To change the maximum size allowed for an index key, specifyWITH(MAX KEY SIZE n)
and replacen
with a value from 100 bytes to 3072 bytes. NuoDB throws an exception if you try to specify a maximum key size that is less than 100 bytes or more than 3072 bytes. See Example 4. WITH ( NULLS DISTINCT | NULLS NOT DISTINCT )
-
This parameter enables you to indicate whether NULL values should be treated as distinct or not distinct values. In scenarios where there are two records with the same values in the corresponding columns, one or more of which are NULL values, a unique index reports a uniqueness violation when
NULLS NOT DISTINCT
is specified. IfNULL DISTINCT
(or neither NULL option) is specified, the twoNULL
values will be treated as different values and both records will be accepted in the unique index. ONLINE
-
Typically, index builds acquire a table lock before populating the index and keep it locked until the index is fully built and the transaction is committed. The table data can be concurrently queried but not updated, deleted, or inserted because the table lock blocks concurrent DML operations. If an index build is executed using
CREATE INDEX … ONLINE
syntax, the build is done in two transactions. The first transaction creates a new index in the catalog usingIncomplete
(Incomplete
appears before the index name in the output of theSHOW TABLE
command). Subsequent DML transactions will update the index, but the read queries ignore it. The second transaction scans the table data and populates the index. USING (BTREE|MERGE TREE)
-
B Tree is the default indexing used when indexes are created using
CREATE INDEX
. To create an index using Merge Tree, useCREATE INDEX <index_name> ON <schema_name.table_name> USING MERGE TREE
.
The process of populating an existing index is inherently slower than the blocking build, but concurrent DML operations are not blocked while the table is scanned. Each of the two transactions involved in building an index online acquires a table lock for a brief period of time, so the index build cannot finish until all concurrent transactions commit/rollback their updates to the table. The DML transactions blocked by the those table locks are aborted if executed under the CONSISTENT READ transaction isolation level.
Knock-on Effects
-
If an index build fails, it may leave an incomplete index behind. Such an index does not affect the operation of read queries, but DML operations continue to update it and failure may occur if the constraint checks based on the index are violated, for example unique constraint violation. Incomplete indexes are useless and must be removed manually.
-
If a concurrent DML violates a constraint such as unique key constraint, either the DML or the index build may fail.
Limitations
-
Only a single index build operation is permitted on a table, regular or online; however, multiple indexes can be specified in a single command.
-
Expression-based indexes cannot be built online.
-
Any changes performed in the transaction prior to invoking
CREATE INDEX … ONLINE
are committed.
Expression-Based Indexing
Expression-based indexes (EBI)s are indexes created using a combination of columns and SQL expressions. Expression-based indexes can speed up queries in which filtering happens, based on expressions rather than columns.
Once an expression-based index has been created on a table, eligible queries using that expression can make use of the index, which is then reflected in the EXPLAIN
plan of the query.
How Expression-Based Indexing Works in NuoDB
When an index is created on an expression, a hidden computed column is generated which contains the result of the expression.
The manner in which the computed column materializes in the database and consumes space on disk, and in memory, is just the same as if it were a column explicitly created by a DBA.
The name of the computed column is automatically generated (for example $NUO_SYSFIELD_1
) and appears in the FIELDS
table (see FIELDS System Table Description).
It is possible to select the value stored in the computed column by using the column name. This can be useful for debugging purposes. However, applications should not rely on directly querying the computed column. NuoDB internally rewrites the expressions to this computed column in cases where the computed column is not directly referenced. |
When an expression-based index is created on a table that already has data, an update is performed which populates the computed column with the correct values. This has several consequences:
-
Creating an expression-based index on a table requires additional
UPDATE
privileges on the table. -
Creating an expression-based index on a table with data takes longer than creating a regular column-based index on the same table.
-
Creating an expression-based index can fail (in rare situations, and return an update conflict error) if the isolation level of the transaction is
CONSISTENT READ
, and there are concurrent writes to the table from other transactions.
The on-disk storage size of the table temporarily doubles when an EBI is added, as during the process there are two copies of the table, old and new. Furthermore, the on-disk storage size will permanently increase for the index added. The size increase is dependent on the width of the key (that is, the Expression). Therefore, confirm there’s sufficient free disk space before adding an EBI. The old copy of the table will eventually be Garbage Collected, marking the space as available for re-use. |
Using Expression-Based Indexes in NuoDB
User-defined functions can be used in expression-based indexes.
If there is an attempt to drop or modify the function, it completes without error, but results in the index not being recreated. Such action leads to undefined behavior and is not recommended. |
If there are two expression-based indexes on a table, and one is a sub-expression of the other, the index that is chosen during a query is not guaranteed to be the best one. See Example 7: Create more than one index.
Limitations of Expression-Based Indexes in NuoDB
The following limitations apply when using expression-based indexes:
-
An expression-based index cannot be created where:
-
The table in question is a temporary table.
-
The expression references another table.
-
The expression contains a subquery.
-
The expression is not deterministic:
-
All built-in functions must be deterministic, that is, no
RAND()
, noNOW()
and so on. This includes date and time functions and operators whose result may depend on the time zone. -
All user-defined functions must be declared as
DETERMINISTIC
.
-
-
The expression uses aggregate/window functions.
-
The expression does not reference any table columns.
-
It is not possible to determine the precision or scale of the result of an expression, for example an arithmetic operation on a
STRING
column, without an explicitCAST
function.
-
-
A unique constraint or primary key cannot be created on an expression.
-
When the expression is a query condition, for example
a = 1
, the index is not used if the condition is the only condition in theWHERE
clause. To use an index in this case, thewhere
clause should be rewritten to(a = 1) = true
. -
If two indexes exist on the same table, where one is created on a sub-expression of the other (for example
a + b
anda + b + c
), the result of using both expressions in a query is non-deterministic, that is, either index could be used, or neither, or both. -
User-defined functions can be used in expression-based indexes, but NuoDB does not track this dependency. The user-defined function can be changed or dropped. If changed, the result is that the data in the index is invalid and the index has to be manually rebuilt. The result of dropping the UDF is that
INSERT
andUPDATE
operations on the target table fail with an error message indicating that the user-defined function cannot be found.
Improving Index Creation Speed
Creating multiple indexes on the same table involves a table scan for each index created.
Concatenating these CREATE INDEX
statements, separated each by a comma (,) indicates to NuoDB that only one table scan is required and performance time for the creation of the index will be improved. For example, where one might run the following commands:
CREATE INDEX idxa ON tab(a);
CREATE INDEX idxb ON tab(b);
CREATE INDEX idxc ON tab(c);
Running the statements below will result in faster index creation:
CREATE INDEX idxa ON tab(a),
CREATE INDEX idxb ON tab(b),
CREATE INDEX idxc ON tab(c);
Note the comma instead of semicolon. Concatenation of CREATE INDEX
statements in this manner is only supported for one table at a time.
Examples
The examples below make use of tables in the HOCKEY
schema, which is a schema defined in the test
database.
The test
database is created when the QuickStart process is run.
See Running the SQL QuickStart.
- Example 1: Creating composite
INDEX
onteams
table.
CREATE INDEX idx_teams_confyr ON hockey.teams(conferenceid, year);
- Example 2: Creating
UNIQUE INDEX
onhockey
table.
CREATE UNIQUE INDEX idx_hockey_number ON hockey.hockey(number);
- Example 3: Creating
INDEX
with greaterRESOLUTION
. -
Create a table, test_scoring, from the hockey.scoring table. reate a composite index on the table with the default
RESOLUTION
value of 10. Select the histogram buckets defined for the index from the system index statistic tables. Change the indexRESOLUTION
value to 300. Select the histogram buckets defined for the index from the index statistic tables. The index key values are now distributed across 300 buckets.
USE TEST;
DROP TABLE IF EXISTS test_scoring;
CREATE TABLE test_scoring
( PLAYERID STRING
, YEAR INTEGER
, STINT INTEGER
, TEAMID STRING
, POSITION STRING);
CREATE INDEX idx_testscoring_composite
ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION);
INSERT INTO test_scoring SELECT PLAYERID,YEAR,STINT,TEAMID,POSITION FROM hockey.scoring;
ANALYZE TABLE test_scoring;
SELECT ih.histogramid,ih.fieldcount,ih.maxresolution
,ihb.bucketindex,ihb.boundary
FROM system.indexstatistics ixs
, system.indexhistograms ih
, system.indexhistogrambuckets ihb
WHERE ixs.schema = 'test'
AND ixs.tablename = 'test_scoring'
AND ixs.indexname = 'idx_testscoring_composite'
AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid
AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid
ORDER BY ih.histogramid,ihb.bucketindex;
HISTOGRAMID FIELDCOUNT MAXRESOLUTION BUCKETINDEX BOUNDARY
------------ ----------- -------------- ------------ ------------------------------------
5 5 10 0 ZERO LENGTH KEY
5 5 10 1 [broderi01], [1974], [1], [QUN], [G]
5 5 10 2 [delmoan01], [1999], [1], [PHI], [D]
5 5 10 3 [gigueje01], [2009], [2], [TOR], [G]
5 5 10 4 [humliv01], [2001], [1], [BOS], [L]
5 5 10 5 [larosgu01], [1993], [1], [TOR], [C]
5 5 10 6 [mckenji02], [1993], [2], [DAL], [L]
5 5 10 7 [patrigl01], [1974], [1], [CLF], [D]
5 5 10 8 [sabouke01], [1990], [1], [CAL], [D]
5 5 10 9 [talboje01], [1968], [1], [STL], [D]
5 5 10 10 [zyuzian01], [2007], [1], [CHI], [D]
DROP INDEX idx_testscoring_composite;
CREATE INDEX idx_testscoring_composite
ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION)
WITH (RESOLUTION 300);
ANALYZE TABLE test_scoring;
SELECT ih.histogramid,ih.fieldcount,ih.maxresolution
,ihb.bucketindex,ihb.boundary
FROM system.indexstatistics ixs
, system.indexhistograms ih
, system.indexhistogrambuckets ihb
WHERE ixs.schema = 'test'
AND ixs.tablename = 'test_scoring'
AND ixs.indexname = 'idx_testscoring_composite'
AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid
AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid
ORDER BY ih.histogramid,ihb.bucketindex;
HISTOGRAMID FIELDCOUNT MAXRESOLUTION BUCKETINDEX BOUNDARY
------------ ----------- -------------- ------------ --------------------------------------
5 5 300 0 ZERO LENGTH KEY
5 5 300 1 [adamske01], [2003], [1], [CAR], [C]
5 5 300 2 [allenbo01], [2002], [1], [EDM], [D]
5 5 300 3 [andergl01], [1989], [1], [EDM], [R]
5 5 300 4 [andrepa01], [1968], [1], [PIT], [R]
5 5 300 5 [archami01], [1976], [1], [CHI], [L]
5 5 300 6 [arvedma01], [1997], [1], [OTT], [L]
5 5 300 7 [aucoiad01], [2011], [1], [PHO], [D]
5 5 300 8 [babycwa01], [1982], [1], [STL], [R]
5 5 300 9 [bakovpe01], [1987], [1], [VAN], [R]
5 5 300 10 [barkldo01], [1964], [1], [DET], [D]
5 5 300 11 [barrile01], [1989], [1], [PHI], [C]
5 5 300 12 [bawaro01], [1992], [1], [SJS], [R]
5 5 300 13 [beaurst01], [1992], [1], [PHI], [G]
5 5 300 14 [belfoed01], [1998], [1], [DAL], [G]
5 5 300 15 [benneha02], [1976], [2], [PHI], [C]
...
5 5 300 289 [weinrer01], [1999], [1], [MTL], [D]
5 5 300 290 [westcdu01], [2002], [1], [CBS], [D]
5 5 300 291 [whiteto02], [1998], [1], [CHI], [C]
5 5 300 292 [wilcoba01], [1972], [1], [VAN], [R]
5 5 300 293 [willsbr01], [1999], [1], [COL], [R]
5 5 300 294 [wilsomi02], [1998], [1], [BUF], [D]
5 5 300 295 [wojcist01], [1946], [1], [DET], [R]
5 5 300 296 [woywije01], [2007], [1], [STL], [D]
5 5 300 297 [yellest01], [2000], [1], [COL], [C]
5 5 300 298 [ysebapa01], [1995], [1], [TBL], [L]
5 5 300 299 [zeniued01], [1954], [1], [DET], [D]
5 5 300 300 [zyuzian01], [2007], [1], [CHI], [D]
- Example 4: Creating
INDEX
with a key size greater than the MAX KEY SIZE -
Create a table with one column. Insert into the table a row with a column value that has a length greater than the default maximum
KEY SIZE
length of 1024. Creating an index on this column will abort due toKEY SIZE
greater thanMAX KEY SIZE
of 1024. The index can be created by increasing theKEY SIZE
of the index using theMAX KEY SIZE
option of theCREATE INDEX
command.
USE TEST;
DROP PROCEDURE IF EXISTS prc_load_testtab;
DROP TABLE IF EXISTS testtab;
CREATE TABLE testtab (col1 STRING);
SET DELIMITER @
CREATE PROCEDURE prc_load_testtab
AS
VAR l_col_length INT = 0;
VAR l_col_val STRING = '1';
WHILE ( l_col_length < 1024 )
l_col_val=l_col_val||'1234567890';
l_col_length = length(l_col_val);
END_WHILE;
INSERT INTO testtab VALUES (l_col_val);
END_PROCEDURE;
@
SET DELIMITER ;
CALL prc_load_testtab;
SELECT LENGTH(col1) AS length_of_col1 FROM testtab;
LENGTH_OF_COL1
---------------
1031
CREATE INDEX ixd_testtab_col1 ON testtab(col1);
Key length exceeds allowed maximum of 1024 bytes in index IXD_TESTTAB_COL1
CREATE INDEX ixd_testtab_col1 ON testtab(col1 ) WITH (MAX KEY SIZE 1033);
SHOW TABLE testtab;
Tables named TESTTAB
Found table TESTTAB in schema TEST
Fields:
COL1 string
Secondary Index: IXD_TESTTAB_COL1 on field: COL1
- Example 5: Creating a simple index on the sum of two columns
-
Create an index, when the table is created, using
KEY
.
CREATE TABLE t1(a INTEGER, b INTEGER, c STRING, KEY i1(a + b));
SELECT c FROM t1 WHERE a + b = 42;
- Example 6: Creating an index on the result of a user-defined function
CREATE TABLE t2(a INTEGER, b STRING);
CREATE FUNCTION f1(s STRING) RETURNS INTEGER DETERMINISTIC AS IF (CHAR_LENGTH(s) > 100) RETURN CHAR_LENGTH(s); ELSE RETURN 2*CHAR_LENGTH(s); END_IF; END_FUNCTION;
CREATE INDEX i2 ON t2(f1(b));
SELECT a, b FROM t2, WHERE f1(b) = 50;
- Example 7: Creating more than one index
-
This example shows how to create two indexes, one containing the other. As stated in Using Expression-based Indexes in NuoDB, it there is no guarantee that the best index will be selected. For example, there is no certainty on whether the best index out of
i3
andi4
will be selected.
CREATE TABLE t3(a INTEGER, b INTEGER, c INTEGER);
CREATE INDEX i3 on t3(a + b);
CREATE INDEX i4 on t3(a + b + c);
SELECT * from t3 where a + b + c = 100;