CREATE INDEX — create a new index
CREATE [ UNIQUE ] INDEX index_name ON [schema.]table_name(column_name|expression [ASC|DESC] [, ...] ) [ WITH ( [RESOLUTION n],[MAX KEY SIZE n], [NULLS [NOT] DISTINCT] ) ] [ONLINE]
In addition, the syntax supports multiple
CREATE INDEX statements concatenated together, separated by a comma (,). See Improving Index Creation Speed below.
CREATE INDEX constructs an index
index_name on the specified
It is not possible to create a redundant index, that is, an index on a column (or expression) or set of columns (or expressions) for which an index already exists.
|Index creation will be aborted if partition migration has been started before the create index or during a create index operation. For more information on using the MOVE PARTITION command, see ALTER TABLE|
If many concurrent long-running index builds are issued, the Storage Manager (SM) could become resource constrained and slow end user application response time.
To prevent this from happening, if the internal index build concurrency limit of
4 is reached, a new index build request will be accepted, but processing will be delayed until the number of concurrently running index builds drops below the limit.
Index builds on empty tables do not count towards the limit.
Commands that can initiate index builds are,
ALTER TABLE table REBUILD INDEXES|INDEX, or the adding of a new partition to an existing table.
To monitor slow index builds or to view queued index builds, use the
nuocmd log-messages --log-options command and log categories
The log message is in the form:
Index build of oid/cid X(txn) stalled for ss seconds waiting for nn concurrent builds to complete.
|Short-running index jobs may run longer than expected if they are queued rather than starting right away as in previous releases.|
Causes the system to check for duplicate values in the table when the index is created (if data already exist) and each time data is added. Attempts to insert or update data which would result in duplicate entries will generate an error. See Example 2.
Column(s) defined by a
UNIQUE INDEX are allowed to contain a
NULL value (see
NULLS [NOT] DISTINCT).
Name of the index to be created. No schema name can be included here; the index is always created in the same schema as its parent table.
Name (optionally schema-qualified) of the table to be indexed
Name of the column(s) of the table to be indexed
Expression(s) to be indexed. See Expression Based Indexing.
The ASC and DESC qualifiers are provided for syntax compatibility purposes only and do not affect the creation of the index. Indexes are always created as if ASC was specified.
WITH ( RESOLUTION n [,…] )
This option allows you to specify the
RESOLUTIONof the histogram, that is, the number of "buckets" used in the histogram. For Version 1 statistics, the default number of buckets created is 10, and for Version 2 statistics, the default number of buckets created is 256. Raising this number will make more precise statistics available to the query optimizer, which will then allow the optimizer to make better decisions, potentially leading to an improvement in query performance. The downside of increasing the number of buckets is that the index will consume more memory and processing when the index is created or updated. Valid
RESOLUTIONvalues for Version 1 statistics are: 10 (the default), 300, 500 or 1000 and for Version 2 statistics: 10 - 1000 (default 256). See Example 3.
See Improving Query Performance for more information on how to use this option.
WITH ( MAX KEY SIZE n )
The default maximum key size is 1024 bytes. NuoDB recommends that index key size be less than or equal to the default maximum because having larger index keys might decrease performance. However, if your application requires a larger maximum index key size then you can specify it when you create an index. There is no system property for setting a maximum index key size. You can increase the maximum index key size only when you create a particular index.
To change the maximum size allowed for an index key, specify
WITH(MAX KEY SIZE n)and replace
nwith a value from 100 bytes to 3072 bytes. NuoDB throws an exception if you try to specify a maximum key size that is less than 100 bytes or more than 3072 bytes. See Example 4.
WITH ( NULLS DISTINCT | NULLS NOT DISTINCT )
This parameter enables you to indicate whether NULL values should be treated as distinct or not distinct values. In scenarios where there are two records with the same values in the corresponding columns, one or more of which are NULL values, a unique index reports a uniqueness violation when
NULLS NOT DISTINCTis specified. If
NULL DISTINCT(or neither NULL option) is specified, the two
NULLvalues will be treated as different values and both records will be accepted in the unique index.
Typically, index builds acquire a table lock before populating the index and keep it locked until the index is fully built and the transaction is committed. The table data can be concurrently queried but not updated, deleted, or inserted because the table lock blocks concurrent DML operations. If an index build is executed using
CREATE INDEX … ONLINEsyntax, the build is done in two transactions. The first transaction creates a new index in the catalog using
Incompleteappears before the index name in the output of the
SHOW TABLEcommand). Subsequent DML transactions will update the index, but the read queries ignore it. The second transaction scans the table data and populates the index.
The process of populating an existing index is inherently slower than the blocking build, but concurrent DML operations are not blocked while the table is scanned. Each of the two transactions involved in building an index online acquires a table lock for a brief period of time, so the index build cannot finish until all concurrent transactions commit/rollback their updates to the table. The DML transactions blocked by the those table locks are aborted if executed under the CONSISTENT READ transaction isolation level.
If an index build fails, it may leave an incomplete index behind. Such an index does not affect the operation of read queries, but DML operations continue to update it and failure may occur if the constraint checks based on the index are violated, for example unique constraint violation. Incomplete indexes are useless and must be removed manually.
If a concurrent DML violates a constraint such as unique key constraint, either the DML or the index build may fail.
Only a single index build operation is permitted on a table, regular or online; however, multiple indexes can be specified in a single command.
Expression-based indexes cannot be built online.
Any changes performed in the transaction prior to invoking
CREATE INDEX … ONLINEare committed.
Expression-based indexes (EBI)s are indexes created using a combination of columns and SQL expressions. Expression-based indexes can speed up queries in which filtering happens, based on expressions rather than columns.
Once an expression-based index has been created on a table, eligible queries using that expression can make use of the index, which is then reflected in the
EXPLAIN plan of the query.
When an index is created on an expression, a hidden computed column is generated which contains the result of the expression.
The manner in which the computed column materializes in the database and consumes space on disk, and in memory, is just the same as if it were a column explicitly created by a DBA.
The name of the computed column is automatically generated (for example
$NUO_SYSFIELD_1) and appears in the
FIELDS table (see FIELDS System Table Description).
|It is possible to select the value stored in the computed column by using the column name. This can be useful for debugging purposes. However, applications should not rely on directly querying the computed column. NuoDB internally rewrites the expressions to this computed column in cases where the computed column is not directly referenced.|
When an expression-based index is created on a table that already has data, an update is performed which populates the computed column with the correct values. This has several consequences:
Creating an expression-based index on a table requires additional
UPDATEprivileges on the table.
Creating an expression-based index on a table with data takes longer than creating a regular column-based index on the same table.
Creating an expression-based index can fail (in rare situations, and return an update conflict error) if the isolation level of the transaction is
CONSISTENT READ, and there are concurrent writes to the table from other transactions.
|The on-disk storage size of the table temporarily doubles when an EBI is added, as during the process there are two copies of the table, old and new. Furthermore, the on-disk storage size will permanently increase for the index added. The size increase is dependent on the width of the key (that is, the Expression). Therefore, confirm there’s sufficient free disk space before adding an EBI. The old copy of the table will eventually be Garbage Collected, marking the space as available for re-use.|
User-defined functions can be used in expression-based indexes.
|If there is an attempt to drop or modify the function, it completes without error, but results in the index not being recreated. Such action leads to undefined behavior and is not recommended.|
If there are two expression-based indexes on a table, and one is a sub-expression of the other, the index that is chosen during a query is not guaranteed to be the best one. See Example 7: Create more than one index.
The following limitations apply when using expression-based indexes:
An expression-based index cannot be created where:
The table in question is a temporary table.
The expression references another table.
The expression contains a subquery.
The expression is not deterministic:
All built-in functions must be deterministic, that is, no
NOW()and so on. This includes date and time functions and operators whose result may depend on the time zone.
All user-defined functions must be declared as
The expression uses aggregate/window functions.
The expression does not reference any table columns.
The expression references a column whose data type is
NUMBERor if the expression uses a user-defined function which returns
CASTfunction is used to change the
NUMBERdata type to some other compatible data type, for example
It is not possible to determine the precision or scale of the result of an expression, for example an arithmetic operation on a
STRINGcolumn, without an explicit
A unique constraint or primary key cannot be created on an expression.
When the expression is a query condition, for example
a = 1, the index is not used if the condition is the only condition in the
WHEREclause. To use an index in this case, the
whereclause should be rewritten to
(a = 1) = true.
If two indexes exist on the same table, where one is created on a sub-expression of the other (for example
a + band
a + b + c), the result of using both expressions in a query is non-deterministic, that is, either index could be used, or neither, or both.
User-defined functions can be used in expression-based indexes, but NuoDB does not track this dependency. The user-defined function can be changed or dropped. If changed, the result is that the data in the index is invalid and the index has to be manually rebuilt. The result of dropping the UDF is that
UPDATEoperations on the target table fail with an error message indicating that the user-defined function cannot be found.
Creating multiple indexes on the same table involves a table scan for each index created.
CREATE INDEX statements, separated each by a comma (,) indicates to NuoDB that only one table scan is required and performance time for the creation of the index will be improved. For example, where one might run the following commands:
CREATE INDEX idxa ON tab(a); CREATE INDEX idxb ON tab(b); CREATE INDEX idxc ON tab(c);
Running the statements below will result in faster index creation:
CREATE INDEX idxa ON tab(a), CREATE INDEX idxb ON tab(b), CREATE INDEX idxc ON tab(c);
Note the comma instead of semicolon. Concatenation of
CREATE INDEX statements in this manner is only supported for one table at a time.
The examples below make use of tables in the
HOCKEY schema, which is a schema defined in the
test database is created when the QuickStart process is run.
See Running the SQL QuickStart.
- Example 1: Creating composite
CREATE INDEX idx_teams_confyr ON hockey.teams(conferenceid, year);
- Example 2: Creating
CREATE UNIQUE INDEX idx_hockey_number ON hockey.hockey(number);
- Example 3: Creating
Create a table, test_scoring, from the hockey.scoring table. reate a composite index on the table with the default
RESOLUTIONvalue of 10. Select the histogram buckets defined for the index from the system index statistic tables. Change the index
RESOLUTIONvalue to 300. Select the histogram buckets defined for the index from the index statistic tables. The index key values are now distributed across 300 buckets.
USE TEST; DROP TABLE IF EXISTS test_scoring; CREATE TABLE test_scoring ( PLAYERID STRING , YEAR INTEGER , STINT INTEGER , TEAMID STRING , POSITION STRING); CREATE INDEX idx_testscoring_composite ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION); INSERT INTO test_scoring SELECT PLAYERID,YEAR,STINT,TEAMID,POSITION FROM hockey.scoring; ANALYZE TABLE test_scoring; SELECT ih.histogramid,ih.fieldcount,ih.maxresolution ,ihb.bucketindex,ihb.boundary FROM system.indexstatistics ixs , system.indexhistograms ih , system.indexhistogrambuckets ihb WHERE ixs.schema = 'test' AND ixs.tablename = 'test_scoring' AND ixs.indexname = 'idx_testscoring_composite' AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid ORDER BY ih.histogramid,ihb.bucketindex; HISTOGRAMID FIELDCOUNT MAXRESOLUTION BUCKETINDEX BOUNDARY ------------ ----------- -------------- ------------ ------------------------------------ 5 5 10 0 ZERO LENGTH KEY 5 5 10 1 [broderi01], , , [QUN], [G] 5 5 10 2 [delmoan01], , , [PHI], [D] 5 5 10 3 [gigueje01], , , [TOR], [G] 5 5 10 4 [humliv01], , , [BOS], [L] 5 5 10 5 [larosgu01], , , [TOR], [C] 5 5 10 6 [mckenji02], , , [DAL], [L] 5 5 10 7 [patrigl01], , , [CLF], [D] 5 5 10 8 [sabouke01], , , [CAL], [D] 5 5 10 9 [talboje01], , , [STL], [D] 5 5 10 10 [zyuzian01], , , [CHI], [D] DROP INDEX idx_testscoring_composite; CREATE INDEX idx_testscoring_composite ON test_scoring(PLAYERID,YEAR,STINT,TEAMID,POSITION) WITH (RESOLUTION 300); ANALYZE TABLE test_scoring; SELECT ih.histogramid,ih.fieldcount,ih.maxresolution ,ihb.bucketindex,ihb.boundary FROM system.indexstatistics ixs , system.indexhistograms ih , system.indexhistogrambuckets ihb WHERE ixs.schema = 'test' AND ixs.tablename = 'test_scoring' AND ixs.indexname = 'idx_testscoring_composite' AND ixs.objectid = ih.objectid and ixs.catalogid = ih.catalogid AND ih.objectid = ihb.objectid and ih.catalogid = ihb.catalogid ORDER BY ih.histogramid,ihb.bucketindex; HISTOGRAMID FIELDCOUNT MAXRESOLUTION BUCKETINDEX BOUNDARY ------------ ----------- -------------- ------------ -------------------------------------- 5 5 300 0 ZERO LENGTH KEY 5 5 300 1 [adamske01], , , [CAR], [C] 5 5 300 2 [allenbo01], , , [EDM], [D] 5 5 300 3 [andergl01], , , [EDM], [R] 5 5 300 4 [andrepa01], , , [PIT], [R] 5 5 300 5 [archami01], , , [CHI], [L] 5 5 300 6 [arvedma01], , , [OTT], [L] 5 5 300 7 [aucoiad01], , , [PHO], [D] 5 5 300 8 [babycwa01], , , [STL], [R] 5 5 300 9 [bakovpe01], , , [VAN], [R] 5 5 300 10 [barkldo01], , , [DET], [D] 5 5 300 11 [barrile01], , , [PHI], [C] 5 5 300 12 [bawaro01], , , [SJS], [R] 5 5 300 13 [beaurst01], , , [PHI], [G] 5 5 300 14 [belfoed01], , , [DAL], [G] 5 5 300 15 [benneha02], , , [PHI], [C] ... 5 5 300 289 [weinrer01], , , [MTL], [D] 5 5 300 290 [westcdu01], , , [CBS], [D] 5 5 300 291 [whiteto02], , , [CHI], [C] 5 5 300 292 [wilcoba01], , , [VAN], [R] 5 5 300 293 [willsbr01], , , [COL], [R] 5 5 300 294 [wilsomi02], , , [BUF], [D] 5 5 300 295 [wojcist01], , , [DET], [R] 5 5 300 296 [woywije01], , , [STL], [D] 5 5 300 297 [yellest01], , , [COL], [C] 5 5 300 298 [ysebapa01], , , [TBL], [L] 5 5 300 299 [zeniued01], , , [DET], [D] 5 5 300 300 [zyuzian01], , , [CHI], [D]
- Example 4: Creating
INDEXwith a key size greater than the MAX KEY SIZE
Create a table with one column. Insert into the table a row with a column value that has a length greater than the default maximum
KEY SIZElength of 1024. Creating an index on this column will abort due to
KEY SIZEgreater than
MAX KEY SIZEof 1024. The index can be created by increasing the
KEY SIZEof the index using the
MAX KEY SIZEoption of the
USE TEST; DROP PROCEDURE IF EXISTS prc_load_testtab; DROP TABLE IF EXISTS testtab; CREATE TABLE testtab (col1 STRING); SET DELIMITER @ CREATE PROCEDURE prc_load_testtab AS VAR l_col_length INT = 0; VAR l_col_val STRING = '1'; WHILE ( l_col_length < 1024 ) l_col_val=l_col_val||'1234567890'; l_col_length = length(l_col_val); END_WHILE; INSERT INTO testtab VALUES (l_col_val); END_PROCEDURE; @ SET DELIMITER ; CALL prc_load_testtab; SELECT LENGTH(col1) AS length_of_col1 FROM testtab; LENGTH_OF_COL1 --------------- 1031 CREATE INDEX ixd_testtab_col1 ON testtab(col1); Key length exceeds allowed maximum of 1024 bytes in index IXD_TESTTAB_COL1 CREATE INDEX ixd_testtab_col1 ON testtab(col1 ) WITH (MAX KEY SIZE 1033); SHOW TABLE testtab; Tables named TESTTAB Found table TESTTAB in schema TEST Fields: COL1 string Secondary Index: IXD_TESTTAB_COL1 on field: COL1
- Example 5: Creating a simple index on the sum of two columns
Create an index, when the table is created, using
CREATE TABLE t1(a INTEGER, b INTEGER, c STRING, KEY i1(a + b)); SELECT c FROM t1 WHERE a + b = 42;
- Example 6: Creating an index on the result of a user-defined function
CREATE TABLE t2(a INTEGER, b STRING); CREATE FUNCTION f1(s STRING) RETURNS INTEGER DETERMINISTIC AS IF (CHAR_LENGTH(s) > 100) RETURN CHAR_LENGTH(s); ELSE RETURN 2*CHAR_LENGTH(s); END_IF; END_FUNCTION; CREATE INDEX i2 ON t2(f1(b)); SELECT a, b FROM t2, WHERE f1(b) = 50;
- Example 7: Creating more than one index
This example shows how to create two indexes, one containing the other. As stated in Using Expression-based Indexes in NuoDB, it there is no guarantee that the best index will be selected. For example, there is no certainty on whether the best index out of
i4will be selected.
CREATE TABLE t3(a INTEGER, b INTEGER, c INTEGER); CREATE INDEX i3 on t3(a + b); CREATE INDEX i4 on t3(a + b + c); SELECT * from t3 where a + b + c = 100;