Data Type Conversion
The Type System is a set of rules and transformations that NuoDB applies to data types during query compilation. The goals of the process are:
-
To detect and correct mismatches in data types as early as possible in the query execution process
-
To prevent bugs in user applications caused by incorrect use of the query language.
All references to the term "type" used in this section refer to "data type".
Data type conversions are necessary because many operators and functions expect each of their arguments to be of a certain data type which allows for proper reconciliation of scale, precision, length, collation, etc. during comparisons and computations. In addition, data type conversions are necessary to choose a single data type for the results of conditional expressions and the items in the SELECT
list UNION
or UNION ALL
.
Implicit vs. Explicit Data Type Conversion
NuoDB avoids implicit data conversions (coercions) that could lead to loss of precision and issues an error message in such cases. The user can use the CAST
operator to allow a conversion that could lead to loss of precision or for overriding the implicit conversion behavior. Refer to the Data Type Conversion Table in this section.
Implications of Data Type Conversion
A data type conversion may change the semantics of comparison operations. Consider the following examples:
-
'00001' = 1 is true if the string '00001' is converted to a numeric value 1 but false if 1 is converted to the string '1'.
-
'100' < 2 is false if the string '100' is converted to a numeric value 100 but true if 2 is converted to the string '2'.
Due to the differences in comparison semantics, indexes cannot be used when data conversion has been applied on a column in a comparison operator. This is because the natural ordering of the index data type might differ from the order required by the comparison. To favour index scans in comparisons between constants and columns, the constant will be implicitly converted to the type of the column. For any other expressions, the general data type conversion rules apply.
Implicit conversion of LOB data types (CLOB or BLOB) requires in-memory copies of the values and might have negative performance implications. |
Sources for type information
NuoDB uses two sources to obtain type information: table column definitions and query constants.
NuoDB supports boolean, string, number, binary and temporal constants
Boolean Constants
NuoDB creates a boolean constant for the special words TRUE and FALSE. For more details, please read SQL-BooleanTypes.
String Constants
NuoDB creates constants of type STRING for the quoted literals in the query string. NuoDB does not provide ways to create VARCHAR or CHAR constants.
For more information on string types in NuoDB, see SQLString and Character Types
Number Constants
There are 3 formats interpreted as number constants in NuoDB:
-
integer types (e.g. 1230)
-
NUMERIC numbers (e.g. 12.30)
-
scientific notation (e.g. 123e1)
NuoDB selects the smallest type in which the value can be stored. For example, the constant 100 will have a type SMALLINT
while the constant 40001 will be of type INTEGER
. For more information on numbers, see SQL Numeric Types.
Constants using the numeric notation will have a type NUMERIC
.
Constants using the scientific notation will be of type DOUBLE
.
Binary Constants
NuoDB creates a constant with type BINARY
for hexadecimal constants. For more information on binary constants, please read SQL Binary Types
Temporal Constants
Temporal constants are constants prefixed with TIMESTAMP, TIMESTAMP WITHOUT TIME ZONE, TIME or DATE keywords followed by a textual representation of the temporal value. For more information on temporal constants, please read SQL Date and Time Types
Table columns
The type for table columns is defined when the column is created and can be altered with ALTER TABLE
statements.
Functions and Operators
For the Type System, NuoDB implements two categories of functions and operators:
-
Signature operators and functions (e.g.
SUBSTRING
,YEAR
, etc. - see SQL Functions and Operations for a comprehensive list) -
Common Type operators and functions (e.g.
COALESCE
,IFNULL
, etc. - see SQL Functions and Operations for a comprehensive list).
Signature Functions and Operators
Each signature function or operators has a list of signatures depending on the number of arguments it accepts and their type. As a separate step during query compilation, NuoDB deduces which signature should be applied based on the arguments of the operator or function. If there isn’t an exact match, NuoDB uses type inference to deduce the closest matching signature and then applies implicit conversions (coercions) on the arguments to convert them to the desired type. If a coercion is not allowed between the type of the arguments and the type required by the signature, NuoDB returns an error. The following table shows the supported implicit conversions.
Data Type Conversion Table
The following table describes the supported data type conversions in NuoDB:
Example
The SUBSTRING
function has the following two signatures:
(String, Integer) -> String
(String, Integer, Integer) -> String
For the following query:
select substring(123456, 1.0, '2') s1, substring('aaaa', '3') s2 from dual;
The Type System will use the signatures in the following ways:
-
For s1, the first signature cannot be used because the number of arguments does not match. The only other option is the second signature and therefore each argument will be converted to the type it specifies.
-
Similarly, for s2, the second signature cannot be applied due to the number of arguments and the only option left is to use the first signature.
After applying the signatures, the query was rewritten to:
select
substring(coerce(123456 as string), coerce (1.0 as integer), coerce('2' as integer)) s1,
substring('aaaa', coerce('3' as integer)) s2
from dual;
Common type
Operators and functions that are handled using the Common Type approach operate on arguments of the same type. During compile time, NuoDB deduces the common type of the arguments using the following common type transition table:
Type |
Transitions |
String |
String → Boolean → Bytes → Numeric → Double → Time → Timestamp → Timestamp without time zone |
Char |
Char → VarChar → String → Boolean → Bytes → Numeric → Double → Time → Timestamp → Timestamp without time zone |
VarChar |
VarChar → String → Boolean → Bytes → Numeric → Double → Time → Timestamp → Timestamp without time zone |
SmallInt |
SmallInt → Integer → BigInt → Numeric → Float → Double |
Integer |
Integer → BigInt → Numeric → Float → Double |
BigInt |
BigInt → Numeric → Double |
Float |
Float → Double |
Double |
Double |
Date |
Date → Timestamp → Timestamp without time zone |
Timestamp |
Timestamp → Timestamp without time zone |
Time |
Time → Timestamp → Timestamp without time zone |
Blob |
Blob → Bytes |
Clob |
Clob → String → Bytes |
Numeric |
Numeric → Double |
Number |
Number → Numeric → Double |
Bytes |
Bytes → BigInt → Numeric |
Binary |
Binary → VarBinary → Bytes |
VarBinary |
VarBinary → Bytes |
Boolean |
Boolean → SmallInt → Integer → BigInt → Numeric → Float → Double |
The common type for any two types is computed by iterating over their respective transitions types and stopping at the first transition type that is common to both. When applying the common type algorithm on multiple arguments, NuoDB first evaluates the type of each of the arguments. Then, NuoDB computes the common type across all arguments. The operation proceeds from left to right by computing the common type between the current argument and the next argument until no more arguments remain. Note that the rule is not associative in this case; the resulting type depends on the order in which the arguments appear in the operator or function.
Example
COALESCE
takes a variable number of arguments and returns the first one (from left to right) which is not null. To determine the return type of this function, NuoDB computes the common type of all the arguments, implicitly converts all of them to the common type and sets that type as the return type of the function:
select coalesce('1', 1, cast(2 as double)) from dual;
In this case, coalesce is called for types string, SmallInt and double. The common type between string and smallInt is numeric (the first match between them in the transition table). The common type between numeric and double is double. Thus, coalesce function will return double and all the arguments will be converted to a double. NuoDB internally rewrites the query to:
select coalesce(coerce('1' as double), coerce (1 as double), cast(2 as double)) from dual;
The COERCE operator
The COERCE
operator performs the implicit conversions decided by the Type System.
The presence of this operator in a query plan indicates that the initial query contains type discrepancies and such discrepancies were corrected automatically by the NuoDB query compiler.
Functionally, the COERCE
operator is similar to the CAST
operator, but COERCE
is not available as a separate call (it is only used internally). While the CAST
operator performs truncation or rounding in cases when the value to be converted does not fit in the target type, the COERCE
operator, throws an error instead if the conversion would require truncation or rounding.
To understand the difference between CAST
and COERCE
consider the following example:
SQL>create table t(col decimal(4, 2));
SQL>insert into t(col) values(2.1);
SQL>insert into t(col) values(2.6);
SQL>explain select substring('nuodb', col) from t;
Select
List
Alias:SUBSTR
Substr
"nuodb"
Coerce T.COL
as integer
Table Scan T [cost: 33.00, rows: 11.00]
The second argument is wrapped in a COERCE
operator because the type of this argument is decimal(4, 2) while SUBSTR
expect an integer. If we were to execute this query, we would get an error because the value 2.1 cannot be converted to an integer without the loss of the decimal part:
SQL>select substring('nuodb', col) from t;
Numeric overflow converting integer of scale 1 to integer of scale 0
In this case, in order to achieve the desired behavior of rounding the value from the column COL to an integer, users must explicitly cast the value as an integer:
SQL>explain select substring('nuodb', cast(col as integer)) from t;
Select
List
Alias:SUBSTR
Substr
"nuodb"
Cast
T.COL
as integer
Table Scan T [cost: 33.00, rows: 11.00]
SQL>select col, substring('nuodb', cast(col as integer)) from t;
COL SUBSTR
---- -------
2.10 uodb
2.60 odb
Note that in this case the query plan does not contain a COERCE
operator (since CAST
already returns an integer type), and the decimal column is rounded to an integer.
Another important difference between the COERCE
and CAST
operators is that the COERCE
operator can be removed automatically as NuoDB replaces expressions containing constants with their results (the so called "constant folding" optimization), while CAST
is never optimized this way. This means that coercions might have been performed on constants at compile time and not be visible in the query plan. Using SUBSTRING
as an example again:
SQL>explain (object_types on) select 1 const, substring(string_col, 1) from t1 ;
Select
List
Alias:CONST
"1" (smallint)
Alias:SUBSTR
Substr
Field T1.STRING_COL
"1" (integer)
Table Scan T1 [cost: 117.00, rows: 13.00]
The output shows the constant 1 as being of type SmallInt when used separately, but is of type integer when used as an argument for SUBSTRING. This occurs because the substring expression was rewritten to substring(string_col, coerce(1 as integer)) and then the second argument was optimized to the integer1.
Arithmetic expressions
The signature for arithmetic expressions (+, -, %, * and /) is defined as (Number, Number) → Number where number can be any of the NuoDB number types.
The signature is applied in two steps. In the first step, each argument is coerced to one of the number types based on the common type transition table. In the second step, a common type is computed based on the updated arguments and this type will be the return type of the expression. If the resulting common type is a SmallInt or Integer, NuoDB upgrades it to the next larger type to allow room for overflowing. If the common type is SmallInt, the return type will be Integer; and if the common type is Integer, the result type will be BigInteger.
Example
For the query:
select '1.1' + 1 from dual;
the first step decides type Numeric for the string constant '1.1', because this is the only available transition from string to a number type. In step two, the return type of the expression is determined to be numeric based on the common type between numeric and SmallInt. The query is rewritten internally as:
select coerce('1.1' as numeric) + 1 from dual;
Precision and scale for operations on numeric arguments
When applied to numeric operations, NuoDB computes the precision and scale using the algorithm from the table below ("lhs" indicates the left argument, "rhs" the right).:
Expression |
Precision |
Scale |
lhs + rhs |
1 + max(lhs.precision, rhs.precision) |
max(lhs.scale, rhs.scale) |
lhs - rhs |
1 + max(lhs.precision, rhs.precision) |
max(lhs.scale, rhs.scale) |
lhs % rhs |
min(lhs.precision - lhs.scale, rhs.precision - lhs.scale) + result.scale |
max(lhs.scale, rhs.scale) |
lhs * rhs |
lhs.precision + rhs.precision |
lhs.scale + rhs.scale |
lhs / rhs |
lhs.precision - lhs.scale + rhs.scale + result.scale |
max(6, lhs.scale + rhs.precision + 1) |
Note that the same algorithm is also applied for the function variants of these expressions (such as DIV, MOD, etc.). When the resulting precision is greater than the maximum numeric precision (38), the following adjustments are being made:
-
If the integral part (i.e. precision - scale) is less than 32, then the result is NUMERIC(38, min(scale, 38 - integral part)
-
If the integral part is greater than 32 but scale is less than 6, then the result is NUMERIC(38, scale);
-
If the integral part is greater than 32 and scale is greater than 6, the result is NUMERIC(38, 6)
These adjustments are made so that the most significant part of the number can occupy as much space as possible. If the execution of the operation produces a number with an integral part greater than the one computed here, an oveflow error is returned. If the execution of the operation produces a number with a scale larger than the scale computed here, the number is rounded to the scale computed here.
Query parameters
The SQL standard does not provide a way to specify the type of a query parameter at compile time. NuoDB considers query parameters to be of type String and allows the general algorithm to decide the type they should be coerced to. This approach allows for the correct result to be obtained in a wide range of situations:
SQL>select 1 / 1 as consts, 1 / ? query_parameters from dual;
Enter value: 1
CONSTS query_parameters
------- ----------
1 1.000000
Because the type of the variable is string, the result of the second expression is Numeric. In additional, a default scale and precision are computed for it. The CAST
operator can be used to overcome this limitation.
Comparison operators
Comparison operators also use the common type signature method. During query compilation, a common type is decided between the arguments of the comparison. Coercions are applied only if the operator would not otherwise be able to compare the two arguments. This means, for example, no coercion will be performed when comparing an integer toa decimal(10, 3). However, when comparing an integer with a string, the string will be coerced to a numeric type.
The query optimizer cannot correctly decide when an index scan can be performed if the column from the index is wrapped in a coerce operator. Therefore, NuoDB avoids unnecessary coercions in such cases.
With type checking enabled, it is not possible to use an index when the column data type and comparison expression are of different data types. When comparing numbers of different types, NuoDB will use an index. An index will not be used in cases such as comparing a string column with a numeric expression. Type casting the expression value to agree with the column data type will utilize an available index. |
The IN operator
The IN operator applies the comparison common type algorithm in two steps. First, it determines the common type between the elements in the IN
list and then the common type between the left-hand argument and the common type of the IN
list. The common type of the arguments on the right-hand side is computed from left to right by computing the common type between the currently computed common type and the next argument.
Conditional expressions
The conditional expressions (CASE, COALESCE, LEAST, GREATEST, IFNULL
) are defined on a list of arguments (and search conditions, in the case of CASE
) and return one of these arguments. The NuoDB Type System must make sure that all of the arguments have the same type (and precision and scale in case of decimal or numeric arguments) so that the type of the value computed during query evaluation is consistent across all the rows of the returned result.
Example:
SQL>create table t1(s smallint, i int, d1 decimal (10, 4), d2 decimal(14, 3));
SQL>insert into t1 values(1, 2, 3, 4);
SQL>insert into t1 values(2, 1, 3, 4);
SQL>insert into t1 values(2, 3, 1, 4);
SQL>insert into t1 values(2, 3, 4, 1);
SQL>explain select least(s, i, d1, d2) from t1;
Select
List
Alias:LEAST
Least
Coerce
T1.S
as numeric(15,4)
Coerce
T1.I
as numeric(15,4)
Coerce
T1.D1
as numeric(15,4)
Coerce
T1.D2
as numeric(15,4)
Table Scan T1 [cost: 168.00, rows: 14.00]
In the example above, arguments of the least expression are all coerced to the same precision and scale in order to force a consistent result across multiple rows, as shown below:
SQL>select s, i, d1, d2, least(s, i, d1, d2) from t1;
S I D1 D2 LEAST
-- -- ------ ----- ------
1 2 3.0000 4.000 1.0000
2 1 3.0000 4.000 1.0000
2 3 1.0000 4.000 1.0000
2 3 4.0000 1.000 1.0000
CASE and CASE search
For both versions of CASE, NuoDB enforces the same type, precision, and scale for all the branches of the CASE expression. Considering the table t1 above, a case expression returning either the column i or the column d1 would have the following query plan:
SQL>explain select case when i = 1 then i when d1 = 1 then d1 end from t1;
Select
List
Case
Eql
T1.I
"1"
Coerce
T1.I
as numeric(13,4)
Eql
T1.D1
"1"
Coerce
T1.D1
as numeric(13,4)
Table Scan T1 [cost: 168.00, rows: 14.00]
and return the following result:
SQL>select i, d1, case when i = 1 then i when d1 = 1 then d1 end from t1;
I D1
-- ------ ------
2 3.0000 <null>
1 3.0000 1.0000
3 1.0000 1.0000
3 4.0000 <null>
The type of the search condition is determined separately from the common type of the branches that return values. For the more general version of CASE, correctness is enforced on each condition in isolation:
SQL>explain select case when s < 2 then 2 when s < '3' then 3 end from t1;
Select
List
Case
Lss
Coerce
T1.S
as numeric
"2"
"2"
Lss
T1.S
"3"
"3"
Table Scan T1 [cost: 30.00, rows: 10.00]
In the first condition, s and the constant “2” are compared as numbers. In the second condition, s and the constant “3” are compared as strings.
For the CASE SEARCH variant, a common type is enforced on the first argument and on all the returned values:
SQL>explain select case s when 2 then 2 when '3' then 3 end from t1;
Select
List
CaseSearch
Coerce
T1.S
as numeric
"2"
"2"
"3"
"3"
Table Scan T1 [cost: 30.00, rows: 10.00]
In the example above, all comparisons are performed on numbers.
UNION
NuoDB uses a common type to decide the type, precision, and scale of each column in the result of a query with UNION branches and coerces the columns in each branch to the common type. The common type is computed top down through the branches and is not commutative.
Example:
SQL>create table t1(i int, d date);
SQL>create table t2(dec decimal(7, 2), tm timestamp);
SQL>select i, d from t1 union select dec, tm from t2;
SQL>explain select i, d from t1 union select dec, tm from t2;
Select
*** union fields ***
Union
Select
List
Alias:I
Coerce
T1.I
as numeric(9,2)
Alias:D
Coerce
T1.D
as timestamp
Table Scan T1 [cost: 60.00, rows: 10.00]
Select
List
Alias:DEC
Coerce
T2.DEC
as numeric(9,2)
T2.TM
Table Scan T2 [cost: 60.00, rows: 10.00]
In this above example, the types of the returned columns are decimal(9, 2) and timestamp, and coercions have been placed on all the columns which, initially, do not have the correct type. The t2.tm column is not coerced, because it is already a timestamp.